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To Donna 




Preface 



This book is a thorough introduction to linear algebra, for the graduate 
or advanced undergraduate student. Prerequisites are limited to a 
knowledge of the basic properties of matrices and determinants. 
However, since we cover the basics of vector spaces and linear 
transformations rather rapidly, a prior course in linear algebra (even at 
the sophomore level), along with a certain measure of “mathematical 
maturity,” is highly desirable. 

Chapter 0 contains a summary of certain topics in modern algebra 
that are required for the sequel. This chapter should be skimmed 
quickly and then used primarily as a reference. Chapters 1-3 contain a 
discussion of the basic properties of vector spaces and linear 
transformations. 

Chapter 4 is devoted to a discussion of modules, emphasizing a 
comparison between the properties of modules and those of vector 
spaces. Chapter 5 provides more on modules. The main goals of this 
chapter are to prove that any two bases of a free module have the same 
cardinality and to introduce noetherian modules. However, the 
instructor may simply skim over this chapter, omitting all proofs. 
Chapter 6 is devoted to the theory of modules over a principal ideal 
domain, establishing the cyclic decomposition theorem for finitely 
generated modules. This theorem is the key to the structure theorems 
for finite dimensional linear operators, discussed in Chapters 7 and 8. 

Chapter 9 is devoted to real and complex inner product spaces. 
The emphasis here is on the finite-dimensional case, in order to arrive 
as quickly as possible at the finite-dimensional spectral theorem for 
normal operators, in Chapter 10. However, we have endeavored to 
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state as many results as is convenient for vector spaces of arbitrary 
dimension. 

The second part of the book consists of a collection of independent 
topics, with the one exception that Chapter 13 requires Chapter 12. 
Chapter 11 is on metric vector spaces, where we describe the structure 
of symplectic and orthogonal geometries over various base fields. 
Chapter 12 contains enough material on metric spaces to allow a unified 
treatment of topological issues for the basic Hilbert space theory of 
Chapter 13. The rather lengthy proof that every metric space can be 
embedded in its completion ‘may be omitted. 

Chapter 14 contains a brief introduction to tensor products. In 
order to motivate the universal property of tensor products, without 
getting too involved in categorical terminology, we first treat both free 
vector spaces and the familiar direct sum, in a universal way. Chapter 
15 is on affine geometry, emphasizing algebraic, rather than geometric, 
concepts. 

The final chapter provides an introduction to a relatively new 
subject, called the umbral calculus. This is an algebraic theory used to 
study certain types of polynomial functions that play an important role 
in applied mathematics. We give only a brief introduction to the 
subject — emphasizing the algebraic aspects, rather than the 
applications. This is the first time that this subject has appeared in a 
true textbook. 

One final comment. Unless otherwise mentioned, omission of a 
proof in the text is a tacit suggestion that the reader attempt to supply 
one. 



Steven Roman 



Irvine, Ca. 




Contents 



Preface 






vii 


Chapter 0 








Preliminaries 






1 


Part 1: Preliminaries. 


Matrices. 


Determinants. 


Polynomials. 


Functions. Equivalence 


Relations. 


Zorn’s Lemma. 


Cardinality. 



Part 2: Algebraic Structures. Groups. Rings. Integral Domains. 
Ideals and Principal Ideal Domains. Prime Elements. Fields. The 
Characteristic of a Ring. 



Part 1 Basic Linear Algebra 

Chapter 1 

Vector Spaces 27 

Vector Spaces. Subspaces. The Lattice of Subspaces. Direct Sums. 
Spanning Sets and Linear Independence. The Dimension of a Vector 
Space. The Ro^v and Column Space of a Matrix. Coordinate Matrices. 
Exercises. 

Chapter 2 

Linear Transformations 45 

Linear Transformations. The Kernel and Image of a Linear 

Transformation. Isomorphisms. The Rank Plus Nullity Theorem. 

Linear Transformations from to F”^. Change of Basis Matrices. 
The Matrix of a Linear Transformation. Change of Bases for Linear 
Transformations. Equivalence of Matrices. Similarity of Matrices. 
Invariant Subspaces and Reducing Pairs. Exercises. 




X 



Contents 



Chapter 3 

The Isomorphism Theorems 63 

Quotient Spaces. The First Isomorphism Theorem. The Dimension of 
a Quotient Space. Additional Isomorphism Theorems. Linear 
Functionals. Dual Bases. Reflexivity. Annihilators. Operator 
Adjoints. Exercises. 



Chapter 4 

Modules I 83 

Motivation. Modules. Submodules. Direct Sums. Spanning Sets. 
Linear Independence. Homomorphisms. Free Modules. Summary. 
Exercises. 

Chapter 5 

Modules II 97 

Quotient Modules. Quotient Rings and Maximal Ideals. Noetherian 
Modules. The Hilbert Basis Theorem. Exercises. 



Chapter 6 

Modules over Principal Ideal Domains 107 

Free Modules over a Principal Ideal Domain. Torsion Modules. The 
Primary Decomposition Theorem. The Cyclic Decomposition Theorem 
for Primary Modules. Uniqueness. The Cyclic Decomposition 
Theorem. Exercises. 



Chapter 7 

The Structure of a Linear Operator 121 

A Brief Review. The Module Associated with a Linear Operator. 
Submodules and Invariant Subspaces. Orders and the Minimal 
Polynomial. Cyclic Submodules and Cyclic Subspaces. Summary. The 
Decomposition of V. The Rational Canonical Form. Exercises. 




Contents 



XI 



Chapter 8 

Eigenvalues and Eigenvectors 135 

The Characteristic Polynomial of an Operator. Eigenvalues and 
Eigenvectors. The Cayley-Hamilton Theorem. The Jordan Canonical 
Form. Geometric and Algebraic Multiplicities. Diagonalizable 
Operators. Projections. The Algebra of Projections. Resolutions of the 
Identity. Projections and Diagonalizability. Projections and Invariance. 
Exercises. 



Chapter 9 

Real and Complex Inner Product Spaces 157 

Introduction. Norm and Distance. Isometries. Orthogonality. 
Orthogonal and Orthonormal Sets. The Projection Theorem. The 
Gram-Schmidt Orthogonalization Process. The Riesz Representation 
Theorem . Exercises . 



Chapter 10 

The Spectral Theorem for Normal Operators 175 

The Adjoint of a Linear Operator. Orthogonal Diagonalizability. 

Motivation. Self-Adjoint Operators. Unitary Operators. Normal 

Operators. Orthogonal Diagonalization. Orthogonal Projections. 
Orthogonal Resolutions of the Identity. The Spectral Theorem. 
Functional Calculus. Positive Operators. The Polar Decomposition of 
an Operator. Exercises. 



Part 2 Topics 

Chapter 11 

Metric Vector Spaces 205 

Symmetric, Skew-symmetric and Alternate Forms. The Matrix of a 
Bilinear Form. Quadratic Forms. Linear Functionals. Orthogonality. 
Orthogonal Complements. Orthogonal Direct Sums. Quotient Spaces. 
Symplectic Geometry-Hyperbolic Planes. Orthogonal Geometry- 
Orthogonal Bases. The Structure of an Orthogonal Geometry. 
Isometries. Symmetries. Witt’s Cancellation Theorem. Witt’s 
Extension Theorem. Maximum Hyperbolic Subspaces. Exercises. 




Contents 



xii 



Chapter 12 

Metric Spaces 239 

The Definition. Open and Closed Sets, Convergence in a Metric Space. 
The Closure of a Set. Dense Subsets. Continuity. Completeness. 
Isometries. The Completion of a Metric Space. Exercises. 



Chapter 13 

Hilbert Spaces 263 

A Brief Review. Hilbert Spaces. Infinite Series. An Approximation 
Problem. Hilbert Bases. Fourier Expansions. A Characterization of 
Hilbert Bases. Hilbert Dimension. A Characterization of Hilbert 
Spaces. The Riesz Representation Theorem. Exercises. 



Chapter 14 

Tensor Products 291 

Free Vector Spaces. Another Look at the Direct Sum. Bilinear Maps 
and Tensor Products. Properties of the Tensor Product. The Tensor 
Product of Linear Transformations. Change of Base Field. Multilinear 
Maps and Iterated Tensor Products. Alternating Maps and Exterior 
Products. Exercises. 



Chapter 15 

Affine Geometry 315 

Affine Geometry. Affine Combinations. Affine Hulls. The Lattice of 
Flats. Affine Independence. Affine Transformations. Projective 
Geometry. Exercises. 



Chapter 16 

The Umbral Calculus 329 

Formal Power Series. The Umbral Algebra. Formal Power Series as 
Linear Operators. Sheffer Sequences. Examples of Sheffer Sequences. 
Umbral Operators and Umbral Shifts. Continuous Operators on the 
Umbral Algebra. Operator Adjoints. Automorphisms of the Umbral 
Algebra. Derivations of the Umbral Algebra. Exercises. 



References 
Index of Notation 
Index 



353 

355 

357 




CHAPTER 0 

Preliminaries 



In this chapter, we briefly discuss some topics that are needed for the 
sequel. This chapter should be skimmed quickly and then used primarily 
as a reference. 

Contents: Part 1: Preliminaries, Matrices. Determinants. 

Polynomials. Functions. Equivalence Relations. Zorn^s Lemma. 
Cardinality. Part 2: Algebraic Structures. Groups. Rings. Integral 
Domains. Ideals and Principal Ideal Domains. Prime Elements. 
Fields. The Characteristic of a Ring. 



Part 1 Preliminaries 

Matrices 

If F is a field, we let ^mn(^) denote the set of all mxn 
matrices whose entries lie in F. When no confusion can arise, we 
denote this set by simply by JL. The set will be 

denoted by ^^(F) or J\d^. 

We expect that the reader is familiar with the basic properties of 
matrices, including matrix addition and multiplication. If A 6 the 
(i,j)-th entry of A will be denoted by A—. The identity matrix of size 
n X n is denoted by 1^^. 

Definition The transpose of A G m matrix A^ defined by 

= Aj,i 

A matrix A is symmetric if A = A^ and skew-symmetric if 
= -A. D 
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Theorem 0.1 (Properties of the transpose) Let A, B G Then 

1) (aT = a 

2) (A + Bf = A"^ + 6^ 

3) (rA)^ = rA^, for all r € F 

4) (AB)^ = B^A^, provided that the product AB is defined 

5) det(A"*^) = det(A). I 

Recall that there are three types of elementary row operations. 
Type 1 operations consist of multiplying a row of A by a nonzero 
scalar (that is, an element of F). Type 2 operations consist of 
interchanging two rows of A. Type 3 operations consist of adding a 
scalar multiple of one row of A to another row of A. 

If we perform an elementary operation of type k ( = 1,2 or 3) to 
an identity matrix 1^^, we get an elementary matrix of type k. It is 
easy to see that all elementary matrices are invertible. 

If A has size m x n, then in order to perform an elementary row 
operation on A, we may instead perform that operation on the identity 
Ij^^, to obtain an elementary matrix E, and then take the product EA. 
Note that we must multiply A on the left by E, since multiplying on 
the right has the effect of performing column operations. 

Definition A matrix R is said to be in reduced row echelon form if 

1) All rows consisting only of Os appear at the bottom of the matrix. 

2) In any nonzero row, the first nonzero entry is a 1. This entry is 
called a leading entry. 

3) For any two consecutive rows, the leading entry of the lower row 
is to the right of the leading entry of the upper row. 

4) Any column that contains a leading entry has Os in all other 
positions. D 

Here are the bctsic facts concerning reduced row echelon form. 

Theorem 0.2 Two matrices A and B in ^ are row equivalent if 
one can be obtained from the other by a series of elementary row 
operations. We denote this by A ~ B. 

1) Row reduction is an equivalence relation. That is, 

a) A ~ A 

b) A^B=>B~A 

c) A - B, B - C => A - C. 

2) Any matrix A is row equivalent to one and only one matrix R 
that is in reduced row echelon form. The matrix R is called the 
reduced row echelon form of A. Furthermore, we have 

A = E^‘ • *Ej^R 
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where are the elementary matrices required to reduce A to 
reduced row echelon form. 

3) A is invertible if and only if R is an identity matrix. Hence, a 
matrix is invertible if and only if it is the product of elementary 
matrices. I 



Determinants 

We assume that the reader is familiar with the following basic 

properties of determinants. 

Theorem 0.3 Let A be an n x n matrix over F. Then det(A) is an 

element of F. Furthermore, 

1) det(AB) = det(A)det(B), for any B G Jtj^(F). 

2) A is nonsingular (invertible) if and only if det(A) ^ 0. 

3) The determinant of an upper triangular, or lower triangular, 
matrix is the product of the entries on its main diagonal. 

4) Let A(i,j) denote the matrix obtained by deleting the ith row and 
jth column from A. The adjoint of A is the matrix adj{A) 
defined by 

(arfi(A))ij = (-iy+idet(A(id)) 

If A is invertible, then 



Polynomials 

If F is a field, then F[x] denotes the set of all polynomials in 
the variable x, with coefficients from F. If p(x) G F[x], we say that 
p(x) is a polynomial over F. If 

p(x) = aQ + a^x + h a^^x^ 

is a polynomial, with a^^ ^ 0, then a^^ is called the leading coefficient 
of p(x), and the degree deg p(x) of p(x) is n. We will set the 
degree of the zero polynomial to -oo. A polynomial is monic if its 
leading coefficient is 1. 

Theorem 0.4 (Division algorithm) Let f(x) G F[x] and g(x) G F[x], 
where deg g(x) > 0. Then there exist unique polynomials q(x) and 
r(x) in F[x] for which 

f(x) = q(x)g(x) + r(x) 
where r(x) = 0 or 0 < deg r(x) < deg g(x). I 
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If p(x) divides q(x), that is, if there exists a polynomial f(x) 
for which 

q(x) = f(x)p(x) 

then we write p(x) | q(x). 

Theorem 0.5 Let f(x) and g(x) be polynomials over F. The 
greatest common divisor of f(x) and g(x), denoted by gcd(f(x),g(x)), 
is the unique monic polynomial p(x) over F for which 

1) p(x)|f(x) and p(x) | g(x) 

2) if r(x) I f(x) and r(x) | g(x), then r(x) | p(x). 

Furthermore, there exist polynomials a(x) and b(x) over F for 
which 

gcd(f(x),g(x)) = a(x)f(x) + b(x)g(x) I 

Definition Let f(x) and g(x) be polynomials over F. If 

gcd(f(x),g(x)) = 1, we say that f(x) and g(x) are relatively prime. In 
particular, f(x) and g(x) are relatively prime if and only if there exist 
polynomials a(x) and b(x) over F for which 

a(x)f(x) + b(x)g(x) = 1 D 

Definition A nonconstant polynomial f(x) E F[x] is irreducible if 
whenever f(x) = p(x)q(x), then one of p(x) or q(x) must be 
constant. D 

The following two theorems support the view that irreducible 
polynomials behave like prime numbers. 

Theorem 0.6 If f(x) is irreducible and f(x) | p(x)q(x), then either 
f(x) I p(x) or f(x) I q(x). D 

Theorem 0.7 Every nonconstant polynomial in F[x] can be written as 
a product of irreducible polynomials. Moreover, this expression is 
unique up to order of the factors and multiplication by a scalar. D 



Functions 

To set our notation, we should make a few comments about 
functions. 

Definition Let f:S— »T be a function (map) from a set S to a set T. 

1) The domain of f is the set S. 

2) The image or range of f is the set im{i) = {f(s) | s € S}. 

3) f is injective (one-to-one), or an injection, if x ^ y f(x) f(y). 
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4) f is surjective (onto T), or a surjection, if im({) = T. 

5) f is bijective, or a bijection, if it is both injective and surjective. D 

If f:S— >T is injective, then its inverse f~^:im(f)^S exists and is 
well-defined. It will be convenient to apply f:S— to subsets of S 
and T. In particular, if X C S, we set f(X) = {f(x) | x G X} and if 
Y C T, we set f“^(Y) = {s G S | f(s) G Y). Note that the latter is 
defined even if f is not injective. 

If X C S, the restriction of f:S— is the function f | ^'X— >T. 
Clearly, the restriction of an injective map is injective. 



Equivalence Relations 

The concept of an equivalence relation plays a major role in the 
study of matrices and linear transformations. 

Definition Let S be a nonempty set. A binary relation ~ on S is 
called an equivalence relation on S if it satisfies the following 
conditions. 

1) (reflexivity) 

a a 

for all a G S. 

2) (symmetry) 

a ^ b b ~ a 

for all a, b G S. 

3) (transitivity) 

a^b, b~c a^^c 

for all a, b, c G S. D 

Definition Let be an equivalence relation on S. For a G S, the set 

[a] = {b G S 1 b - a} 
is called the equivalence class of a. D 

Theorem 0.8 Let ~ be an equivalence relation on S. Then 

1) b G [s-] ■O’ a G [b] O [a] = [b] 

2) For any a, b G S, we have either [a] — [b] or [a] f1 [b] = 0 . I 

Definition Let S be a nonempty set. A partition of S is a collection 
{Aj,...,Aj^} of nonempty subsets of S, called blocks, for which 

1) A|flAj=:0, for all i,j 

2) S = AjU-'-UA^^. D 
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The following theorem sheds considerable light on the concept of 
an equivalence relation. 

Theorem 0.9 

1) Let ^ be an equivalence relation on S. Then the set of distinct 
equivalence classes with respect to ~ are the blocks of a partition 
of S. 

2) Conversely, if ^ is a partition of S, the binary relation 
defined by 

a> ^ a, and b lie in the same block of 

is an equivalence relation on S, whose equivalence classes are the 
blocks of 

This establishes a one-to-one correspondence between equivalence 
relations on S and partitions of S. I 

The most important problem related to equivalence relations is 
that of finding an efficient way to determine when two elements are 
equivalent. Unfortunately, in most cases, the definition does not 
provide an efficient test for equivalence, and so we are led to the 
following concepts. 

Definition Let ~ be an equivalence relation on S. A function 
f:S-^T, where T is any set, is called an invariant of ^ if 

a ~ b => f(a) = f(b) 

A function f:S-^T is a complete invariant if 

a ~ b <=> f(a) = f(b) 

A collection f|,...,fj^ of invariants is called a complete system of 
invariants if 

a^b ^ f|(a)=f|(b) for all i = l,...,n D 

Definition Let ^ be an equivalence relation on S. A subset C C S is 
said to be a set of canonical forms for ^ if for every s G S, there is 
exactly one c G C such that c ^ s, D 

Example 0.1 Define a binary relation ^ on F[x] by letting 
p(x) q(x) if and only if there exists a nonzero constant a G F such 
that p(x) = aq(x). This is easily seen to be an equivalence relation. 
The function that assigns to each polynomial its degree is an invariant, 
since 

p(x) ~ q(x) => deg(p(x)) = deg(q(x)) 

However, it is not a complete invariant, since there are inequivalent 
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polynomials with the same degree. The set of all monic polynomials is 
a set of canonical forms for this equivalence relation. D 

Example 0.2 We have remarked that row equivalence is an equivalence 
relation on Moreover, the subset of reduced row echelon 

form matrices is a set of canonical forms for row equivalence, since 
every matrix is row equivalent to a unique matrix in reduced row 
echelon form. D 

Example 0.3 Two matrices A, B G equivalent if and 

only if there is an invertible matrix P such that A = PB. Similarly, 
A and B are column equivalent (that is, A can be reduced to B 
using elementary column operations) if and only if there exists an 
invertible matrix Q such that A = BQ. 

Two matrices A and B are said to be equivalent if there exists 
invertible matrices P and Q for which 

A = PBQ 

Put another way, A and B are equivalent if A can be reduced to B 
by performing a series of elementary row and/or column operations. 
(The use of the term equivalent is unfortunate, since it applies to all 
equivalence relations — not just this one. However, the terminology is 
standard, so we use it here.) 

It is not hard to see that a square matrix R that is in both 
reduced row echelon form and reduced column echelon form must have 
the form 




with Os everywhere off the main diagonal, and k Is, followed by 
n — k Os, on the main diagonal. 

We leave it to the reader to show that every matrix A in is 
equivalent to exactly one matrix of the form Jj^, and so the set of these 
matrices is a set of canonical forms for equivalence. Moreover, the 
function f defined by f(A) = k, where A ^ Jj^, is a complete invariant 
for equivalence. 

Since the rank of Jj^ is k, and since neither row nor column 
operations affect the rank, we deduce that the rank of A is k. Hence, 
rank is a complete invariant for equivalence. D 
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Example 0.4 Two matrices A, B G are said to be similar if 

there exists an invertible matrix P such that 

A = PBP-^ 

Similarity is easily seen to be an equivalence relation on J^>^. As we 
will learn, two matrices are similar if and only if they represent the 
same linear operators on a given n-dimensional vector space V. Hence, 
similarity is extremely important for studying the structure of linear 
operators. One of the main goals of this book is to develop canonical 
forms for similarity. 

We leave it to the reader to show that the determinant function 
and the trace function are invariants for similarity. However, these two 
invariants do not, in general, form a complete system of invariants. D 

Example 0.5 Two matrices A, B G ^n(F) are said to be congruent if 
there exists an invertible matrix P for which 

A = PBP"^ 

where P^ is the transpose of P. This relation is easily seen to be an 
equivalence relation, and we will devote some effort to finding canonical 
forms for congruence. For some ba^e fields F (such as R, C or a 
finite field), this is relatively easy to do, but for other b£ise fields (such 
as Q), it is extremely difficult. D 

Zorn^s Lemma 

In order to show that any vector space has a basis, we require a 
result known as Zorn’s lemma. To state this lemma, we need some 
preliminary definitions. 

Definition A partially ordered set is a nonempty set P, together with 
a partial order defined on P. A partial order is a binary relation, 
denoted by < and read ‘‘less than or equal to,” with the following 
properties. 

1) (reflexivity) For all a G P, 

a < a 

2) (antisymmetry) For all a,b G P, 

a < b and b < a implies a = b 

3) (transitivity) For all a,b,c G P, 

a < b and b < c implies a < c D 
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Definition If P is a partially ordered set and if m G P has the 
property that m < p implies m = p, then m is called a maximal 
element in P. D 

Definition Let P be a partially ordered set and let a,b G P. If there 
is a u G P with the property that 

1) a < u and b < u,. and 

2) if a < X and b < x, then u < x 

then we say that u is the least upper bound of a and b, and write 
u = lub{a,b}. If there is an element £ G P with the property that 

3) £ < a and £ < b, and 

4) if X < a and x < b, then x < £ 

then we say that £ is the greatest lower bound of a and b, and write 
£ = glb{a,b}. D 

Note that in a partially ordered set, it is possible that not all 
elements are comparable. In other words, it is possible to have x,y G P 
with the property that x ^ y and y ^ x. A partially ordered set in 
which every pair of elements is comparable is called a totally ordered 
set, or a linearly ordered set. Any totally ordered subset of a partially 
ordered set P is called a chain in P. 

Let S be a subset of a partially ordered set P. We say that an 
element u G P is an upper bound for S if s < u for all s G S. 

Example 0.6 

1) The set IR of real numbers, with the usual binary relation < is 

a partially ordered set. It is also a totally ordered set. It has no 

maximal element. 

2) The set N of natural numbers, together with the binary relation 
of divides, is a partially ordered set. It is customary to write 
n 1 m to indicate that n divides m. The subset S of N 
consisting of all powers of 2 is a totally ordered subset of N, 
that is, it is a chain in N. The set P = {2,4,8,3,9,27} is a 
partially ordered set under | . It has two maximal elements, 
namely 8 and 27. 

3) Let S be any set, and let ^P(S) be the power set of S, that is, 

the set of all subsets of S. Then "iP(S), together with the subset 

relation C , is a partially ordered set. D 

Now we can state Zorn’s lemma. 

Theorem 0.10 (Zorn’s lemma) Let P be a partially ordered set in 
which every chain has an upper bound. Then P has a maximal 
element. I 
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The reader who is interested in looking at an example of the use of 
Zorn’s lemma now might wish to refer to the proof in Chapter 1 that 
every vector space has a basis. 

Cardinality 

We will say that two sets S and T have the same cardinality, 
and write 

|S| = |T| 

if there is a bijective function (a one-to-one correspondence) between the 
sets. The reader is probably aware of the fact that 

I Z I = I N I and I Q I = I N I 

where N, Z and Q are the natural numbers, integers, and rational 
numbers, respectively. 

If S is in one-to-one correspondence with a subset of T, we write 
I S I < I T I . If S is in one-to-one correspondence with a proper 
subset of T, and if | S | ^ | T | , we write | S | < | T | . The second 
condition is necessary, since, for instance, N is in one-to-one 
correspondence with a proper subset of Z, and yet | N | <f: | Z | . 

This is not the place to enter into a detailed discussion of cardinal 
numbers. The intention here is that the cardinality of a set, whatever 
that is, represents the “size” of the set, and it happens that it is much 
easier to talk about two sets having the same, or different, size 
(cardinality) than it is to explicitly define the size (cardinality) of a 
given set. 

Be that as it may, we associate to each set S a cardinal number, 
denoted by | S | or card{S), that is intended to measure the size of 
the set. Actually, cardinal numbers are just very special types of sets. 
However, we can simply think of them as vague amorphous objects that 
measure the size of sets. 

A set is finite if it can be put in one-to-one correspondence with a 
set of the form Z^^ = {0,1,. . .,n-l}, for some positive integer n. The 
cardinal number (or cardinality) of a finite set is just the number of 
elements in the set. The cardinal number of the set N of natural 
numbers is Kq (read “aleph nought”), where K is the first letter of 
the Hebrew alphabet . Hence, 

|N| = |Z| = IQI =No 

Any set with cardinality Kq is called a countably infinite set, and any 
finite or countably infinite set is called a countable set. 

Since it can be shown that | R | > | N | , the real numbers are not 
countable. 




0 Preliminaries 



11 



If S and T are finite sets, then it is well known that 

|S|<|T| and |T1<|S| 1S| = |T| 

The first part of the next theorem tells us that this is also true for 
infinite sets. 

The reader will no doubt recall that the power set ^(S) of a set 
S is the set of all subsets of S. For finite sets, the power set of S is 
always bigger than the set itself. In fact, 

1 S I = n ^ I ^{S) I = 2^ 

The second part of the next theorem says that the power set of any set 
S is bigger than S itself. On the other hand, the third part of this 
theorem says that, for infinite sets S, the set of all finite subsets of S 
is the same size as S. 

Theorem 0.11 

1) (Schroder-Bernstein theorem) For any sets S and T, 

1 S I < I T I and 1 T 1 < I S I | S | := | T | 

2) (Cantor’s theorem) If ‘J(S) denotes the power set of S, then 

I s I < I g^(S) I 

3) If ‘?o(^) denotes the set of all finite subsets of S, and if S is an 
infinite set, then 

|S| = \%{S)\ 

Proof. We prove only parts (1) and (2). 

1) To prove the Schroder-Bernstein theorem, we follow the proof of 
Halmos [I960]. Let f:S-^T be an injective function from S into 
T, and let g:T-^S be an injective function from T into S. We 
want to show that there is a bijective function from S to T. For 
this purpose, we make the following definitions. An element s G S 
has descendants 



f(s), g(f(s)), f(g(f(s))),... 

If t is a descendant of s, then s is an ancestor of t. We define 
descendants of t and ancestors of s similarly. Now, by tracing 
an element’s ancestry to its beginning, we find that there are three 
possibilities — the element may originate in S, or in T, or it may 
have no originator. Accordingly, we can write S as the union of 
three disjoint sets 

Sg = {s G S I s originates in S} 

Srp {s G S I s originates in T} 

and 
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= {s G S I s has no originator} 

Similarly, we write T as the disjoint union of Tg, and T^. 
Now, the restriction 

flsj^Ss-^Ts 

is a bijection. For if t G Tg, then t = f(s'), for some s' G S. But 
s' and t have the same originator, and so s' G Sg. We leave it 
to the reader to show that the functions 

g I and f | 

are also bijections. Putting these three bijections together gives a 
bijection between S and T. Hence, | S | = | T | . 

2) The inclusion map e:S— >^(S) defined by e(s) = {s} is an 
injection from S to ^(S), and so | S | < | ‘J(S) | . To complete 
the proof of Cantor’s theorem, we must show that if f:S-^‘iP(S) is 
any injection, then f is not surjective. To this end, let 

X = {s G S I s f(s)} 

Then X G ^(S), and we now show that X is not in im{{). For 
suppose that X = f(x) for some x G X. Then if x G X, we have 

by definition of X that x ^ X. On the other hand, if x ^ X, we 

have again by definition of X that x G X. This contradiction 
implies that X ^ im(f), and so f is not surjective. I 

Now let us define addition, multiplication and exponentiation of 
cardinal numbers. If S and T are sets, the cartesian product S x T 
is the set of all ordered pairs 

S xT = {(s,t) I s G S, t G T) 

Also, we let S denote the set of all functions from T to S. 

Definition Let /c and A denote cardinal numbers. 

1) The sum ^ + A is the cardinal number of SUT, where S and 

T are any disjoint sets for which | S | = /c and | T | = A. 

2) The product kX is the cardinal number of S x T, where S and 

T are any sets for which | S | = k and | T I = A. 

3) The power is the cardinal number of S^, where S and T 
are any sets for which | S | = /c and | T | = A. D 

We will not go into the details of why these definitions make 
sense. (For instance, they seem to depend on the sets S and T, but in 
fact, they do not.) It can be shown, using these definitions, that 
cardinal addition and multiplication is associative, commutative and 
that multiplication distributes over addition. 
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Theorem 0.12 Let ac, A and be cardinal numbers. Then the 
following properties hold. 

1) (Associativity) 

K-\- {X-\- fi) {k-\- X) fj. and /c(A/i) = (/cA)/i 

2) (Commutativity) 

AC -f A A + ^ and kX = Aac 

3 ) ( Distribu t i vity ) 

ac(A -i- fi) = kX-\- Kfi 

4) (Properties of Exponents) 

a) AC^"^^ = 

b) (acY = k^^ 

c) (acA)^ = ac^A^ I 

On the other hand, the arithmetic of cardinal numbers can seem a 
bit strange at first. 

Theorem 0.13 Let ac and A be cardinal numbers. Then 

1) AC -f A = max{AC,A} 

2) acA = max{AC,A} I 

It is not hard to see that there is a one-to-one correspondence 
between the power set *iP(S) of a set S and the set of all functions 
from S to {0,1}. This leads to the following theorem. 

Theorem 0.14 

1) If I S I = AC then | %S) \ = 2^ 

2) AC < 2^ I 

We have already observed that | N | = Kq. It can be shown that 
Kq is the smallest infinite cardinal, that is, 

AC < Kq => AC is a natural number 

It can also be shown that the set IR of real numbers is in one-to- 
one correspondence with the power set ^(N) of the natural numbers. 
Therefore, 

\R\ = 2 0 

The set of all points on the real line is sometimes called the continuum^ 
and so 2 0 is sometimes called the power of ihe continuum, and 
denoted by c. 

Theorem 0.13 shows that cardinal addition and multiplication has 
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a kind of “absorption” quality, which makes it hard to produce larger 
cardinals from smaller ones. The next theorem demonstrates this more 
dramatically. 



Theorem 0.15 

1) Addition and multiplication, applied a finite number of times to 

the cardinal number anything more than 

Specifically, for any nonzero n G N, 

n • = Kq and Kq = 

2) Addition and multiplication, ^pplied a countable number of tiroes 
to the cardinal number 2 ® do not yield more than 2 
Specifically, we have 

No -2^0 = 2^0 and (2^0)«o = 2^0 I 

Using this theorem, we can establish other relationships, such as 

2^0 < 

which, by the Schroder-Bernstein theorem, implies that 

(Ho)^o = 2^0 

We mention that the problem of evaluating in general is a 
very difficult one, and would take us far beyond the scope of this book. 
We will have use for the following result, whose proof is omitted. 



Theorem 0.16 Let {Aj^ | k G K} be a collection of sets, indexed by the 
set K, with | K | = k. If | Aj^ | < A for all k G K, then 



U A. 

ke 



< Xk 



I 



Let us conclude by describing the cardinality of some famous sets. 

Theorem 0.17 

1) The following sets have cardinality Kq. 

a) The rational numbers Q. 

b) The set of all finite subsets of N. 

c) The union of a countable number of countable sets. 

d) The set of all ordered n-tuple| of integers. 

2) The following sets have cardinality 2 

a) The set of all points in 

b) The set of all infinite sequences of natural numbers. 

c) The set of all infinite sequences of real numbers. 
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d) The set of all finite subsets of R. 

e) The set of all irrational numbers. I 



Part 2 Algebraic Structures 



Groups 

Definition A group is a nonempty set G, together with a binary 
operation denoted by which satisfies the following properties. 

1) (associativity) For all a,b,c G G 

(a*b)*c = a*(b*c) 

2) (identity) There exists an element e G G for which 



3 ) 



e*a = a*e = a 

for all a GG. 

(inverses) For each a G G, there is an element a“^ G G for 
which 

a*a”^ = a~^*a = e D 



Definition A group G is abelian, or commutative, if a*b = b*a, for all 
a,b G G. When a group is abelian, it is customary to denote the 
operation * by -h, thus writing a*b as a-|-b. It is also customary to 
refer to the identity as a zero element, and to denote the inverse a“^ 
by -a, referred to as the negative of a. D 

Example 0.7 The set ^ of all bijective functions from a set S to S, 
is a group under composition of functions. D 

Example 0.8 The set abelian group under addition of 

matrices. The identity is the zero matrix 0^^ ^ of size m x n. 

The set ^^^(F) is not a group under multiplication of matrices, 
since not all matrices have multiplicative inverses. However, the set of 
invertible matrices of size n x n is a nonabelian group under 
multiplication. D 

A group G is finite if it contains only a finite number of 
elements. The cardinality of a finite group G is called its order and is 
denoted by <>(G). Thus, for example, is a finite group, but 

ni(^) finite. 
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Rings 

Definition A ring is a nonempty set R, together with two binary 
operations, called addition (denoted by +), and multiplication (denoted 
by juxtaposition), for which the following hold. 

1) R is an abelian group under addition 

2) (associativity) For all a,b,c G R, 

(ab)c = a(bc) 

3) (distributivity) For all a,b,c G R, 

(a + b)c = ac + be and c(a + b) = ca + cb D 

Definition A ring R is said to be commutative if ab = ba for all 
a,b G R. If a ring R contains an element e with the property that 

ae = ea = a 

for all a G R, we say that R is a ring with identity. The identity e 
is usually denoted by 1. D 

Example 0.9 The set 2^ = {0,l,...,n-l} is a commutative ring under 
addition and multiplication modulo n 

a 0 2 b = (a+b) mod n, a 0 2 b = ab mod n 

The element 1 G 2^^ is the identity. 

Example 0.10 The set of even integers E C 2 is a commutative ring 
under the usual operations on Z, but it has no identity. D 

Example 0.11 The set Jtjj^(F) is a noncommutative ring under matrix 
addition and multiplication. The identity matrix is the identity for 
Jt„(F). D 

Example 0.12 Let F be a field. The set F[x] of all polynomials in a 
single variable x, with coefficients in F, is a commutative ring, under 
the usual operations of polynomial addition and multiplication. What 
is the identity for F[x]? D 

Definition A subring of a ring R is a subset S of R that is a ring in 
its own right, using the same operations as defined on R. D 

Applying the definition is not generally the easiest way to show 
that a subset of a ring is a subring. The following characterization is 
usually easier to apply. 
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Theorem 0.18 A nonempty subset S of a ring R is a subring if and 
only if 

1) S is closed under subiracUon^ that is 

a,b gS => a — bGS 

2) S is closed under multiplication, that is, 

a,b G S => ab G S I 



Integral Domains 

Definition Let R be a ring. A nonzero element r G R is called a zero 
divisor if there exists a nonzero s G R for which rs = 0. A 
commutative ring R with identity is called an integral domain if it 
contains no zero divisors. D 

Example 0.13 If n is not a prime number, then the ring has zero 
divisors, and so is not an integral domain. To see this, observe that if 
n is not prime, then n = ab in 2, where a,b > 2. But in 2^, we 
have 

a O j^b = ab mod n = n mod n = 0 

and so a and b are both zero divisors. As we will see later, if n is a 
prime, then 2^ is an integral domain. D 

Example 0.14 The ring F[x] is an integral domain, since p(x)q(x) = 0 
implies that p(x) = 0 or q(x) = 0. D 

If R is a ring and rx = ry for r,x,y G R, then we cannot in 
general, cancel the r’s, and conclude that x = y. For instance, in 2^, 
we have 2-3 = 2*1, but we cannot cancel the 2’s, to get 3 = 1. 
However, it is precisely the integral domains in which we can cancel. 

Theorem 0.19 Let R be a commutative ring with identity. Then R 
is an integral domain if and only if the cancellation law 

r 0 and rx = ry x = y 

holds in R. 

Proof. Suppose that R is an integral domain. Then 

r ^ 0 and rx = ry => r(x — y) = 0 => x — y = 0 x = y 

Conversely, suppose that the cancellation law holds and that ab = 0. 
If a ^ 0, then we have ab = aO, and so b = 0. Hence, R is an 
integral domain. I 
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Ideals and Principal Ideal Domains 

Rings have another important substructure, besides subrings. 

Definition Let R be a ring. A subset 3 of R is called an ideal if 

1) 3 is closed under subtraction^ that is 

a,b GR => a — bGR 

2) 3 is closed under multiplication by any ring element, that is, 

aG3, rGR => arG3 and ra G 3 D 

Observe that a subring is closed under multiplication, in the sense 
that the product of two elements in the subring is also in the subring. 
However, an ideal has a stronger closure property, namely, the product 
of an element in the ideal and any element in the ring is in the ideal. 

Example 0.15 Let p(x) be a polynomial in F[x]. The set of all 
multiples of p(x) 

(pW) = {q(x)p(x) I q(x) e F[x]} 
is an ideal in F[x]. D 

Definition Let S be a subset of a ring R with identity. The set 
(sj, . . . , s„) = {r^Si + • • • + I r; e R, S; e S} 

is an ideal in R, called the ideal generated by S. It is the smallest (in 
the sense of set inclusion) ideal of R containing S. D 

Note that in the previous definition, we require that R have an 
identity. This is to insure that, for example, s G (s). 

Definition Let R be a ring with identity, and let a G R. The 
principal ideal generated by a is the ideal 

(a) = {ra I r G R} D 

We will use the following algebraic structure quite a bit in the 
sequel. 

Theorem 0.20 Let R be a ring. 

1) The intersection of any collection {3j^ | k G K} of ideals is an 
ideal. 

2) If 3j C 32 C • • • is an ascending sequence of ideals, each one 

contained in the next, then the union is also an ideal. 

Proof. To prove (1), let J = H^k* Then if a,b G we have a,b G 3j^ 
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for all k G K. Hence, a — b G for all k G K, and so a — b G 
Hence, J is closed under subtraction. Also, if r G R, then ra G \ for 
all k G K, and so ra G 3- 

To prove (2), observe that if a,b G U^k’ a G and b G 

for some i, j G N. Hence, if m = max{i,j}, we have a,b G 3^, and so 
a - b e C U Hence, |j3j^ is closed under subtraction. Also, if 
r G R and a G U a G 3| for some i G N, and so 

ra G 3] C U 3j^. Thus, |J 5^ is closed under multiplication by any ring 
element, and so it is an ideal. | 

Note that in general the union of ideals is not an ideal. However, 
as we have just proved, the union of an ascending chain of ideals is an 
ideal. 

Definition An integral domain R in which every ideal is a principal 
ideal is called a principal ideal domain. D 

Theorem 0.21 The integers form a principal ideal domain. In fact, an 
ideal 3 in R is generated by the smallest positive integer a that is 
contained in 3. I 

Theorem 0.22 The ring F[x] is a principal ideal domain. In fact, any 
ideal 3 is generated by the unique monic polynomial of smallest degree 
contained in 3. Moreover, for polynomials Pi,...,Pj^, 

(Pl, •••,?„) = (gcd{Pi,...,Pn}) 

Proof. Let 3 be an ideal in F[x], and let m(x) be a monic 
polynomial of smallest degree in 3. First, we observe that there is only 
one such polynomial in 3. For if n(x) G 3 is monic, and deg n(x) = 
deg m(x), then 

b(x) = m(x) — n(x) G 3 

and since deg b(x) < deg m(x), we must have b(x) = 0, and so 
n(x) = m(x). 

Now, let us show that 3 is generated by m(x). Since 3 is an 
ideal, and m(x) G 3, we have 

(m(x)) C 3 

To establish the reverse inclusion, if p(x) G (m(x)), then dividing p(x) 
by m(x) gives 

p(x) = q(x)m(x) + r(x) 

where r(x) = 0 or 0 < deg r(x) < deg m(x). But since 3 is an ideal, 
r(x) = p(x) - q(x)m(x) G 3 
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and so 0 < deg r(x) < deg m(x) is impossible. Hence, r(x) = 0, and 

p(x) = q(x)m(x) e (m(x)} 

This shows that 3 C {m(x)), and so 3 = (m(x)). 

To prove the second statement, let 3 = (pj(x),...,Pj^(x)). Then, 
by what we have just shown, 

3 = (pi(x),...,pjx)) = (m(x)} 

for the unique monic polynomial m(x) in 3 of smallest degree. In 
particular, since p-(x) € (m(x)), we have 

Pi(x) = ai(x)m(x) 

for some polynomial aj(x), and so m(x) | Pi(x), for each i = l,...,n. 
In other words, m(x) is a common divisor of the Pj(x)’s. 

Moreover, if q(x) | Pj(x), for all i, then each Pj(x) is a multiple 
of q(x), and so 

Pi(x) 6 (q(x)) 

for all i, which implies that 

(m(x)) = (pi(x),...,pjx)) C (q(x)) 

In particular, this implies that m(x) G (q(x)), and so q(x) | m(x). This 
shows that m(x) is the greatest common divisor of the p|(x)’s, and 
completes the proof. I 

Example 0.16 Let R = F[x,y] be the ring of polynomials in two 

variables x and y. Then R is not a principal ideal domain. To see 
this, observe that the subring 3 of all polynomials with zero constant 
term is an ideal in R. Also, x G 3 and y G 3. Now, suppose that 3 
is the principal ideal 3 = (p(x,y)). Then there exist polynomials a(x,y) 
and b(x,y) for which 

(0.1) X = a(x,y)p(x,y) and y = b(x,y)p(x,y) 

But if p(x,y) is a constant polynomial, then 3 = (p(x,y)) is all of R, 
which is not the case. Hence, ie^(p(x,y)) > 1, and so a(x,y) and 
b(x,y) must both be constants, which implies that (0.1) cannot 
possibly hold. D 

Prime Elements 

We can define the notion of a prime element in any integral 
domain. For r,s G R, we say that r divides s (written r | s) if there 
exists an X G R for which s = xr. 
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Definition Let R be an integral domain. 

1) An invertible element of R is called a unit. Thus, u G R is a 
unit if uv = 1 for some v G R. 

2) Two elements a,b G R are said to be associates if there exists a 
unit u for which a == ub. 

3) A nonzero nonunit p G R is said to be prime if p | ab => p | a 
or p I b. 

4) A nonzero nonunit r G R is said to be irreducible if r = ab 
either a or b is a unit. D 

Theorem 0.23 

1) An element u G R is a unit if and only if (u) = R. 

2) r and s are associates if and only if (r) = (s). 

3) r divides s if and only if (s) C (r). 

4) r properly divides s (that is, s = xr where x is not a unit) if 
and only if (s) § (r). I 

In the case of the integers, an integer is prime if and only if it is 
irreducible. However, this is not the case in general. But it is true for 
principal ideal domains. 

Theorem 0.24 Let R be a principal ideal domain. 

1) If r G R is irreducible, then the principal ideal (r) is maximal, 
that is, (r) R and there is no ideal (a) for which (r) § (a) ^ R. 

2) An element in R is prime if and only if it is irreducible. 

3) Any r G R can be written as a product 

where u is a unit, and p^v^Pn primes. Furthermore, this 
factorization is unique up to order, and unit element u. 

Proof. To prove (1), suppose that r is irreducible, and that 
(r) C (a) C R. Then r G (a), and so r = xa for some x G R. The 
irreducibility of r now implies that a or x is a unit. But if a is a 
unit, then (a) == R, and if x is a unit, then (a) = (xa) = (r). This 
shows that (r) is maximal. (We have (r) ^ R, since r is not a unit.) 

To prove (2), assume first that p is prime, and let p = ab. 
Then p | ab, and so p | a or p | b. We may assume that p | a. 
Therefore, a = xp, and p = ab = xpb. Canceling p’s, we get 1 = xb, 
and so b is a unit. Hence, p is irreducible. (Note that this argument 
applies in any integral domain.) 

Conversely, suppose that r is irreducible, and let r | ab. We 
wish to prove that r | a or r | b. In the terminology of ideals, we 
assume that ab G (r), where by part (1), (r) is maximal, and we want 
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to show that a € (r) or b € (r). But 

a ^ (r) (a,r) = R => 1 = xa + yr, for some x,y £ R 

and 

b ^ (r) (b,r) = R 1 = x'b + y'r, for some x',y' £ R 

From this, we get 

1 = (xa + yr)(x'b + y'r) = xx'ab + xy'ar + yx'br + yy'r^ £ (r) 

which implies that r is a unit. This contradiction shows that a £ (r) 
or b G (r). 

To prove ( 3 ), let r € R. If r is irreducible, then we are done. If 
not, then r = t^T2^ where neither factor is a unit. If r| and i2 are 
irreducible, we are done. If not, suppose that i2 is not irreducible. 
Then I2 = where neither r3 nor r4 is a unit. Continuing in this 
way, we obtain a factorization of the form (after renumbering if 
necessary) 

( 0 . 2 ) r = = ri(r3r4) = (rir3)(r5r6) = (rir3r5)(r7rg) = • • • 

Each step is a factorization of r into a product of nonunits. However, 
this process must stop after a finite number of steps. To see this, 
observe that since 

r2|r! T4|r2, rg 1 14 , ... 

the sequence (0.2) gives rise to an ascending sequence of ideals 
(r)c(r2)c(r4)c(r6)--- 

Moreover, since none of the r|’s is a unit, the inclusions in this chain 
are proper. Now, if the factorization process did not stop, we would 
obtain an infinite ascending sequence of such ideals. But, according to 
Theorem 0 . 20 , the union Tl of all of these ideals would be another 
ideal in R, which must be principal. Suppose that TL = (a). Then 
a G TL and so a G (r2j^) for some n. But this is not possible, since it 
would imply that 

= (a) C (r2„) 

which implies that (1*211) “ (^2(n+i)) ~ contradicting the fact that 
the inclusions are proper. I 

Fields 

In a ring, addition is “stronger” than multiplication, in the sense 
that it must possess more properties. In a field, the two operations 
have essentially the same strength. 
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Definition A field is a set F, containing at least two elements, together 
with two binary operations, called addition (denoted by + ) and 
multiplication (denoted by juxtaposition), for which the following hold. 

1) F is an abelian group under addition. 

2) The set F* of all nonzero elements in F is an abelian group 
under multiplication. 

3) (distributivity) For all a,b,c G F, 

(a -h b)c = ac + be and c(a + b) = ca + cb D 

We require that F have at least two elements to avoid the 

pathological case where 0 = 1. 

Example 0.17 The sets Q, R and C, of all rational, real and complex 
numbers, respectively, are fields, under the usual operations of addition 
and multiplication of numbers. D 

Example 0.18 The ring is a field if and only if n is a prime 

number. We have already seen that is not a field if n is not 

prime, since a field is also an integral domain. Now suppose that n = 
p is a prime. 

We have seen that Z is an integral domain, and so it remains to 
show that every nonzero element in Zp has a multiplicative inverse. 
Let 0 ^ a G Zp. Since a < p, we know that a and b are relatively 

prime. It follows that there exists integers u and v for which 

ua -f vp = 1 

Hence, 

ua = (1 — vp) = 1 mod p 

and so u O pa = 1 in Zp, that is, u is the multiplicative inverse of a. D 

The previous example shows that not all fields are infinite sets. In 
fact, finite fields play an extremely important role in many areas of 
abstract and applied mathematics. 



The Characteristic of a Ring 

Let R be a ring. If n is a positive integer, then by n • r, we 
simply mean 

n • r = r H h r 

n terms 

Now, it may happen that there is a positive integer c for which 

c-1 = 1+---+1 =0 

V y 



c terms 
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For instance, in we have n • 1 = n = 0. On the other hand, in Z, 
c • 1 = 0 implies c = 0, and so no such positive integer exists. 

Notice that, in any finite ring or field, there must exist such a 
positive integer c, since the infinite sequence of numbers 

l-l, 2-1, 3-1, ... 

cannot be distinct, and so i • 1 = j • 1 for some i 7^ j. Hence, if i < j, 
we have (j — i) • 1 = 0. 

Definition Let R be a ring. The smallest positive integer c for which 
c • 1 = 0 is called the characteristic of R. If no such number c exists, 
we say that R has characteristic 0. The characteristic of R is 
denoted by char(R). D 

If char(R) = c, then for any r G R, we have 

c • r = r H hr = ( 1 h 1 )r = 0 • r = 0 

c terms c terms 

Theorem 0.25 Any finite ring has nonzero characteristic. Furthermore, 
any finite field has prime characteristic. 

Proof. We have already seen that a finite ring has nonzero 
characteristic. Let F be a finite field, and suppose that char(F) = 
c > 0. If c = pq, where p, q < c, then pq • 1 = 0. Hence, 
(p * 1 )(q * 1) = 0, implying that p • 1 = 0 or q • 1 r= 0. In either case, 
we have a contradiction to the fact that c is the smallest positive 
integer such that c • 1 = 0. Hence, c must be prime. I 

Notice that in any field F of characteristic 2, we have 2a = 0 
for all a G F. Thus, in F, we have 

2 = 0, and a = -a, for all a G F 

These properties take a bit of getting used to, and make fields of 
characteristic 2 quite exceptional. (As it happens, there are many 
important uses for fields of characteristic 2.) 
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Vector Spaces 

Let us begin with the definition of our principle object of study. 

Deflnition Let F be a field, whose elements are referred to as scalars. 
A vector space over F is a nonempty set V, whose elements are 
referred to as vectors, together with two operations. The first 
operation, called addition and denoted by + , assigns to each pair 
(u,v)gVxV of vectors in V a vector u + v in V. The second 
operation, called scalar multiplication and denoted by juxtaposition, 
assigns to each pair (r,u) € F x V a vector rv in V. Furthermore, the 
following properties must be satisfied. 

1) (Associativity of addition) 

u + (v + w) = (u + v) + W 

for all vectors u,v,w G V. 

2) (Commutivity of addition) 

u V = V + u 

for all vectors u,v £ V. 
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3) (Existence of a zero) 

There is a vector 0 G V with the property that 

0 + u = u-f 0 = u 

for all vectors u G V. 

4) (Existence of additive inverses) 

For each vector u G V, there is a vector in V, denoted by -u, 
with the property that 

u + (-u) = (-u) + u= 0 

5) (Properties of scalar multiplication) 

For all scalars r and s, we have 

r(u + v) = ru + rv 
(r + s)u = ru + su 
(rs)u = r(su) 

In = u 



for all vectors u,v G V. D 

Note that the first four properties in the definition of vector space 
can be summarized by saying that V is an abelian group under addition. 
Any expression of the form 

riVi+-“ + r^v^ 

where r^ G F and v^ G V for all i, is called a linear combination of 
the vectors v^ , . . . , v^^. 

Example 1.1 

1) Let F be a field. The set of all functions from F to F is a 
vector space over F, under the operations of ordinary addition 
and scalar multiplication of functions 

(f + g)(x) =f(x)+g(x) 

and 

(rf)(x) = r(f(x)) 

2) The set ^^(F) of all m x n matrices with entries in a field F 
is a vector space over F, under the operations of matrix addition 
and scalar multiplication. 

3) The set F^ of all ordered n-tuples, whose components lie in a 
field F, is a vector space over F, with addition and scalar 
multiplication defined componentwise 

(ai,...,aj + (bp...,bj = (ai+bi,...,a„ + bj 

and 

r(ai,...,aj = (rap..., raj 
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When convenient, we will also write the elements of in column 
form. When F is a finite field F^ with q elements, we use the 
notation V(n,q), rather than F^. Thus, V(n,q) is the set of all 
ordered n-tuples, whose components come from the finite field F^. 
4) There are various sequence spaces that are vector spaces. The set 
Seq(F) of all infinite sequences, whose entries lie in a field F, is a 
vector space, under componentwise operations 

(®n) + (*n) = (®n + *n) 

and 

r(s„) = (rs„) 

In a similar way, the set Cq of all sequences of complex numbers 
that converge to 0 is a vector space, as is the set of all 

bounded complex sequences. Also, if p is a positive integer, then 
the set of all complex sequences (s^^) for which 

^ I Sj^ I ^ < oo is a vector space under componentwise operations. 
To see that addition is a binary operation on (P, one verifies 
Minkowski^s inequality 

( E I + t J < ( E I s J + ( E I tn r)'/*’ 

which we will not do here. (See the exercises in Chapter 12.) D 



Subspaces 

Most algebraic structures contain substructures, and vector spaces 
are no exception. 

Definition A subspace of a vector space V is a subset S of V that 
is a vector space in its own right, under the operations obtained by 
restricting the operations of V to S. D 

Since many of the properties of addition and scalar multiplication 
hold, a fortiori^ in the subset S, we can establish that a nonempty 
subset is a subspace merely by checking that the subset is closed under 
the operations of V. 

Theorem 1.1 A nonempty subset S of a vector space V is a subspace 
if and only if 

1) S is closed under addition^ that is, 

u,v gS => u-f-vGS 

2) S is closed under scalar multiplication^ that is, 

rGF, uGS => ruGS 




30 



1 Vector Spaces 



Equivalently, S is a subspace if and only if 

3) S is closed under taking linear combinations, that is, 

r,s G F, u,v € S => ru + sv G S D 

Example 1.2 Consider the vector space V(n,2) of all binary n-tuples. 
The weight w(y) of a vector vG V(n,2) is the number of nonzero 
coordinates in v. For instance, u;( 101010) = 3. Let be the set of 
all vectors in V of even weight. Then E^^ is a subspace of V(n,2). 

To see this, note that 

w{\i -h v) = w{\i) -f w{\) — 2w{n fl v) 

where uflv is the vector in V(n,2) whose ith component is the 
product of the ith components of u and v, taken modulo 2. That is, 

(u n v)| (u| • V-) mod 2 

Hence, if u;(u) and w{y) are both even, so is u;(u + v). Finally, 
scalar multiplication over F 2 is trivial, and so E^^ is a subspace of 
V(n,2), known as the even weight subspace of V(n,2). D 

Example 1.3 Any subspace of the vector space V(n,q) is called a 
linear code. Linear codes are among the most important, and most 
studied, types of codes, because their structure allows for efficient 
encoding and decoding of information. For a detailed discussion of 
linear (and other) codes, see Roman [1992]. D 



The Lattice of Subspaces 

The set if(V) of all subspaces of a vector space V is partially 
ordered by set inclusion. The zero subspace {0} is the smallest 
element in if(V), and the entire space V is the largest element. 

If S,T G L(V), then SflT is the largest subspace of V that 
contains S and T. Hence, in ^(V), the greatest lower bound of S 
and T is 

glb{S,T} = SnT 

Similarly, if {S- | i G K} is any collection of subspaces of V, then the 
intersection 

nsi 

ieK 

is also a subspace of V, and is the greatest lower bound of the 
collection {SJ. 

On the other hand, if S,T G il’(V), then S UT G if(V) if and only 
if S C T or T C S. Thus, the union of subspaces is never a subspace 
in any “interesting” case. We also have the following. 
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Theorem 1.2 A vector space V over an infinite field F is never the 
union of a finite number of proper subspaces. 

Proof. Suppose that V = U • • • U where we may assume that 
Si (f S 2 U--*USj^. Let w G Si — (S 2 U • •• U Sj^), and let v^Si. 
Consider the infinite set A = {w + rv | r G F}. (This is the “line” 
through w, parallel to v.) We want to show that each S| contains at 
most one vector from the infinite set A, which is contrary to the fact 
that V = Si U • • • U Sj^, and so this will prove the theorem. 

Suppose that w -f rv G Si for r ^ 0. Then since w G Si, we 
would have rv G Si, or v G Si, contrary to assumption. Next, suppose 
that w -h riV, w -f r 2 V G Sp for i > 2, where ri ^ r 2 - Then 

Sj 3 F2 (w + Ijv) - ri(w + r2v) = (f2 - Fj)w 

and so w G Sp which is also contrary to assumption. I 

To determine the smallest subspace of V containing the 
subspaces S and T, we make the following definition. 

Definition Let S and T be subspaces of V. The sum S -f T is the 
set of all sums of vectors from S and T, that is, 

S-}-T=:{u-fv|uGS, vGT} 

More generally, the sum of any collection {S^ | i G K} of subspaces is 
the set of all finite sums of vectors from the union (JS^ 

= {si + --- + 8„|sj 6 U Sj} D 

ieK 

It is not hard to show that the sum of any collection of subspaces 
of V is a subspace of V, and that 

lub{S,T} = S-f T 

and, more generally, 

lub{SJ=^S. 

i€K 

A partially ordered set in which every pair of elements has a least 
upper bound and greatest lower bound is called a lattice. 

Theorem 1.3 The set if(V) of all subspaces of a vector space V is a 
lattice under set inclusion, with 

glb{S,T} = SnT and lub{S,T} = S+T 



D 
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Direct Sums 

As we will see, there are many ways to construct new vector 
spaces from old ones. 

DeOnition Let be vector spaces over the same field F. 

The external direct sum of V^, . . . , denoted by V = Vj ffl • • • ffl Vj^, is 
the vector space V whose elements are ordered n-tuples 

V = {(vi,...,vJ |v;e Vj, i = 

with componentwise operations 

(Ui,...,U„) + (Vi,...,vJ = (Ui+Vi,...,U„ + vJ 

and 

rK.”->vJ = (rvj,...,rvJ D 

Example 1.4 The vector space is the external direct sum of n 

copies of F, that is, 

F^ = FBB---fflF 

where there are n summands on the right-hand side. D 

This construction can be generalized to any collection of vector 
spaces, by generalizing the idea that an ordered n-tuple (vj,...,Vj^) is 
just a function f:{l,...,n}— V^, with the property that f(i) E Vj. 
One possible generalization is given by the following definition. 

Definition Let ^ = {V| | i G K} be any family of vector spaces over F. 
The direct product of ^ is the vector space 

nVi = {f:K-L|Vik(i)eVi} 
i€K i€K 

thought of as a subspace of the vector space of all functions from K to 

LIVj. D 

The following will prove more useful to us, however. 

Definition Let 5 = {Vj | i G K} be a family of vector spaces over F. 
The support of a function f:K-^ tj V| is the set 

sapp(f) = {i e K I f(i) 0} 

Thus, f has finite support if f(i) = 0 for all but a finite number of 
i G K. The external direct sum of the family *3F is the vector space 

ffl Vi = /f:K— » [J V| I f(i) G V|, f has finite support} 
ieK ieK 




1 Vector Spaces 



33 



thought of as a subspace of the vector space of all functions from K to 
UVi- D 

An important special case occurs when V: = V for all i G K. If 

TsT • ^ T<r 

we let V denote the set of all functions from K to V and (V^)q 
denote the set of all functions in that have finite support, then 

J] V = and ffl V = (V^)o 

ieK ieK 

There is also an internal version of the direct sum construction. 

Definition Let V be a vector space. We say that V is the (internal) 
direct sum of a family ^ = {S| | i G K} of subspaces of V if every 
vector V in V can be written, in a unique way (except for order), as a 
finite sum of vectors from the subspaces in that is, if for all v G V, 

v=3Ui + ... + u^ 

for some G S^, and furthermore, if 

V = Wj -f h 

where W| G S|, then W| = u^ for all i 1, . . . , n. 

If V is the direct sum of 7, we write 

V = 0 Si 
iGK 

and refer to each Si as a direct summand of V. If ^ = {Sj,...,Sj^} 
is a finite family, we write 

V-S10.--0S, 

If V = S 0 T, then T is called a complement of S in V. We will 
often write S^ to denote a complement of S. D 

The reader will be asked in a later chapter to show that the 
concepts of internal and external direct sum are essentially equivalent. 
Since the internal version of direct sum will be used more often, we 
simply refer to it as the direct sum. Once we have discussed the 
concept of a basis, the following theorem can be easily proved. 

Theorem 1.4 Any subspace of a vector space has a complement, that 
is, if S is a subspace of V, then there exists a subspace S^ for which 
V = S0S^. D 

It should be emphasized that a subspace generally has many 
complements. The reader can easily find examples of this in R^. The 
following characterization of direct sums is quite useful. 
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Theorem 1.5 A vector space V is the direct sum of a family ^ = 
{V| I i G K} of subspaces if and only if 

1) V= ESi 

i6K 

2) For each i G K, 

Sin(ESi)=(«) 

j 



Proof. Suppose first that 
certainly holds, and if 



then 



V is the direct sum of Then (1) 

.€S,n(j;Sj) 

j 



v = 0-f • +0 + Si + 0+ - + 0 

and 

V = Si+--- + Si_i+0 + Si^l+--- + S„ 



where s- G Sj for all i. Hence, by the uniqueness of direct sum 
representations, S| = 0 for all i = l,...,n, and so v = 0. Thus, (2) 
holds. 

For the converse, suppose that (1) and (2) hold. Then any vector 
V is a sum of vectors from the Sp 

V = Sl + -" + S„ 

where Sj G Sp If 



where G Sp then 



v = + + 



(si - tj) H h - tj = 0 



But if V- = Sj — G S- is nonzero, then V| can be written as a sum of 
vectors from the Sj, with j ^ i, which contradicts (2). Hence, = t| 
for all i, and V is the direct sum of I 



Example 1.5 Any matrix A G can be written in the form 

(1.1) A = 1(A + A^) + 1(A-A^) = B + C 

where A^ is the transpose of A. It is easy to verify that B is 
symmetric, and C is skew-symmetric, and so (l.I) is a decomposition 
of A as the sum of a symmetric matrix and a skew-symmetric matrix. 

Since the sets Sym and SkewSym of all symmetric and skew- 
symmetric matrices in are subspaces of we have 

= Sym -f SkewSym 

Furthermore, if S + T = S' + T', where S and S' are symmetric, and 
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T and T' are skew-symmetric, then the matrix 

U = S-S' = T-T' 

is both symmetric and skew-symmetric. Hence, provided that 
char(F) ^ we deduce that U = 0, and so S = S' and T = T'. 
Thus, 

= Sym 0 SkewSym D 



Spanning Sets and Linear Independence 

A set of vectors spans a vector space if every vector can be written 
as a linear combination of some of the vectors in that set. 

Definition The subspace spanned (or generated) by a set S of vectors 
in V is the set of all linear combinations of vectors from S 

(S> = span{S) = {riVj -f • • • + rj^Vj^ I r| G F, V| G V} 

When S = is a finite set, we use the notation (vj,...,Vj^), 

or 5 pan{vj,..., Vj^}. Aset S of vectors in V is said to span V, or 
generate V, if 

V = span{S) 

that is, if every vector v G V can be written in the form 

v = riVi + ---0r^v^^ 

for some scalars r j , . . . , r^^ and vectors Vj , . . . , D 

It is clear that any superset of a spanning set is also a spanning 
set. Note also that all vector spaces have spanning sets, since the entire 
space is a spanning set. 

Definition The nonempty set S of vectors in V is linearly 
independent if for any Vj, . . . , in V, we have 

riv, + -- +r^v„ = 0 =5^ r, = --- = r^ = 0 

If a set of vectors is not linearly independent, it is linearly dependent. D 

It follows from the definition that any nonempty subset of a 
linearly independent set is linearly independent. 

Theorem 1.6 Let S be a set of vectors in V. 

1) S is linearly independent if and only if every vector in the span of 
S has a unique expression as a linear combination of the vectors 
in S. 
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2) S is linearly independent if and only if no vector in S is a linear 
combination of the other vectors in S. ■ 

The relationship between minimal spanning sets and linear 
independence is described in the following key theorem. 

Theorem 1.7 Let S be a set of vectors in V. The following are 
equivalent. 

1) S is linearly independent and spans V. 

2) For every vector v G V, there is a unique set of vectors V|,...,Vj^ 
in S, along with a unique set of scalars r^,...,rj^ in F, for which 

v = riVi-f *.. + r^v^ 

3) S is a minimal spanning set in the sense that S spans V, and 
any proper subset of S does not span V. 

4) S is a maximal linearly independent set in the sense that S is 
linearly independent, but any proper superset of S is not linearly 
independent. 

Proof. We leave it to the reader to show that (1) and (2) are 
equivalent. Now suppose (1) holds. Then S is a spanning set. If 
some proper subset S' of S also spanned V, then any vector in 
S — S' would be a linear combination of the vectors in S', contradicting 
the fact that the vectors in S are linearly independent. Hence (1) 
implies (3). 

Conversely, if S is a minimal spanning set, then it must be 
linearly independent. For if not, some vector sGS would be a linear 
combination of the other vectors in S, and so S — {s} would be a 
proper spanning subset of S, which is not possible. Hence (3) implies 
( 1 ). 

Suppose again that (1) holds. Then S is linearly independent. If 
S were not maximal, there would be a vector v G V — S for which the 
set S U {v} is linearly independent. But then v is not in the span of 
S, contradicting the fact that S is a spanning set. Hence, S is a 
maximal linearly independent set, and so (1) implies (4). 

Conversely, if S is a maximal linearly independent set, then it 
must span V, for if not, we could find a vector v G V — S that is not a 
linear combination of the vectors in S. Hence, S U {v} would be a 
linearly independent proper superset of S, which is a contradiction. 
Thus, (4) implies (1). I 

Corollary 1.8 A finite set S = of vectors in V is a basis 

for V if and only if 



V = (vi}©...©(vJ 
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Definition Any set of vectors in V that is linearly independent and 
spans V is called a basis for V. Thus, a set of vectors is a basis for 
V if and only if it satisfies any (and hence all) of the conditions in 
Theorem 1.7. 0 

Example 1.6 The ith standard vector in is the vector e- that has 
Os in all coordinate positions except the ith, where it has a 1. Thus, 

= (1,0,...,0), C 2 = ejj = (0,...,0,l) 

The set {e^, . . . , is called the standard basis for F^. D 

The proof that every nontrivial vector space has a basis is a classic 
example of the use of Zorn’s lemma. 

Theorem 1.9 Any vector space, except the zero space {0}, has a basis. 

Proof. Let V be a nonzero vector space, and consider the collection A 
of all linearly independent subsets of V. This collection is not empty, 
since any single nonzero vector forms a linearly independent set. Now, 
if C I 2 C • • • is a chain of linearly independent subsets of V, then 
the union 

is also a linearly independent set. Hence, every chain in A has an 
upper bound in A, and according to Zorn’s lemma, A must contain a 
maximal element, that is, V has a maximal linearly independent set, 
which is a basis for V by Theorem 1.7. I 

Theorem 1.7 makes it easy to prove the following useful result. 

Theorem 1.10 

1) Any linearly independent set of vectors in V is contained in a 
basis for V. That is, any linearly independent set can be extended 
to a basis for V. 

2) Any spanning set for V contains a basis for V. That is, any 
spanning set can be reduced to a basis for V. I 

The reader can now show, using Theorem 1.10, that any subspace 
of a vector space has a complement. 



The Dimension of a Vector Space 

The next result, with its classical elegant proof, says that if a 
vector space V has a finite spanning set S, then the size of any 
linearly independent set cannot exceed the size of S. 
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Theorem 1.11 Let V be a vector space, and assume that the vectors 
Vi,...,Vn are linearly independent, and the vectors span V. 

Then n < m. 

Proof. First, we list the two sets of vectors 

Then we move the last vector to the front of the first list 

Since span V, is a linear combination of the s-’s. This 

implies that we may remove one of the s^’s, say Sj, from the first list, 
and still have a spanning set 

where the hat " means that the vector has been removed from the list. 

Now we repeat the process, moving from the second list to 

the beginning of the first list 

• • • 5 • • • > • • • ? ^n-2 

As before, the vectors in the first list are linearly dependent, since they 
spanned V before the inclusion of However, since the V|’s are 

linearly independent, any linear combination of the vectors in the first 
list must involve at least one of the s^’s. Hence, we may remove that 
vector, say Sj^, and still have a spanning set 

^n~l 5 • • • ? Sj, . . . , Sj^, . . . , Vj , • • • , 

It should be clear that this process can be continued until we run 
out of either the s^’s or the V|’s. However, if we run out of the Sj’s 
before the v-’s, that is, if m < n, then the first list will be a proper 
subset of the Vj’s that spans V, which contradicts the independence of 
the v-’s. Therefore, m > n. I 

Corollary 1.12 If V has a finite spanning set, then any two bases of 
V have the same size. I 

Now let us prove Corollary 1.12 for arbitrary vector spaces. 

Theorem 1.13 If V is a vector space, then any two bases for V have 
the same cardinality. 

Proof. We may assume that all bases for V are infinite sets, for if any 
basis is finite, then V has a finite spanning set, and so Corollary 1.12 
applies. 
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Let be a basis for V. We may write = {b| | i G I}, where I 
is the index set, used to index the vectors in ^B. Note that 1 1 1 = 

I ^B I . Now let C be another basis for V. Then any vector c G C can 
be written as a finite linear combination of the vectors in ?B, where all 
of the coefficients are nonzero, say 

C = XI 

Here, is a finite subset of the index set I. Now, because C is a 
basis for V, the union of all of the U^,’s, as c varies over C, must 
be I, in symbols, 

(1.2) UUc = I 

cG C 

For if all vectors in the basis C can be expressed as a finite linear 
combination of the vectors ‘JB — for some k, then all vectors in V 
can be expressed in this manner, implying that ^B — {bj^} spans V, 
which is not the case. 

Now, from (1.2), Theorem 0.16 implies that 
l^l = |I|<|C|Ho=|C| 

But we may also reverse the roles of ^B and C, to conclude that 
I C I < 1^1, and so | *35 | = | C | . I 

Theorem 1.13 allows us to make the following definition. 

Definition A vector space V is finite dimensional if it is the zero space 
{0}, or if it has a finite basis. All other vector spaces are infinite 
dimensional. 

The dimension of. the zero space is 0, and the dimension of any 
nonzero vector space is the cardinality of any basis for V. If a vector 
space V has a basis of cardinality k, we say that V is /c-dimensional, 
and write dwifV) =: k. U 

It is easy to see that if S is a subspace of V, then 
diin{S) < rfnn(V). Furthermore, if dim{S) — dim{Y) < oo then S = V. 

Theorem 1.14 Let S and T be subspaces of a finite dimensional 
vector space V, then 

dim{S) -h rfnn(T) == dim{S + T) + dim{S fl T) 

In particular, if is any complement of S in V, then 

dini(S) -f dim{S^) = rfmi(V) I 
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Theorem 1.15 Let V be a vector space. 

1) If '36 is a basis for V, and if '36 = 36^ U '362 n ^62 = 0, 

then 

V = span{^^} 0 span{^2} 

2) Let V = S 0 T. If is a basis for S and ^62 ^ hasis for 

T, then fl ^62 = 0 and U ^62 ^ heisis for V. I 

The Row and Column Space of a Matrix 

Let A be an mxn matrix over F. The rows of A span a 
subspace of known as the row space of A, and the columns of A 
span a subspace of F”^ known as the column space of A. The 
dimensions of these spaces are called the row rank and column rank, 
respectively. We denote the row space and row rank by rs{A) and 
rr(A), and the column space and column rank by C 5 (A) and cr(A). 

It is a remarkable, and useful, fact that the row rank of a matrix 
is always equal to the column rank, despite the fact that if m n, the 
row space and column space lie in different vector spaces. 

To see this, let A be an mxn matrix. Some subset of the rows 
of A form a basis for r 5 (A). Let A' be the submatrix of A 
containing just these rows. Hence, 

(1.3) rr(A') = rr(A) 

and 

(1.4) cr(A') < cr(A) 

Consider the matrix C obtained by throwing away all columns of 
A', except those that form a basis for the column space of A'. Thus, 
C is a matrix of size rr(A') x cr(A'), whose cr{A') columns form a 
basis for cs(A'), which is a subspace of F^, where r = rr(A'). Hence, 

(1.5) cr(A') < rr{A') 

We propose to show that cr(A') = rr{A'), 

If c^,...,Cj^/ are the columns of C, then the ith column of A' 
has the form 

ai = riCi + ... + rj^,Cj^, 

and so 




Hence, if R is the k' x n matrix whose ith column is q, we have 



A' = CR 
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Now, if cr(A') < rr(A'), then the matrix C would have linearly 
dependent rows, and so there would exist a nonzero row vector v for 
which vC = 0. Hence, 

vA' = vCR = 0 

and so the rows of A' would be linearly dependent, which is not the 
case. Therefore, cr{k') <j: rr(A'), and so (1.5) implies that 

cr(A') = rr(A') 

This, together with (1.3) and (1.4), gives 

rr(A) = rr(A') = cr(A') < cr(A) 

But, we can apply the same reasoning to the transpose A^ of A, 
to deduce that 

rr^A"^) < cr{Pj) 

and since rr^A^) == cr(A) and cr(A^) = rr(A), we have 

cr(A) < rr{A) 

which, together with the reverse inequality, gives cr{A) — rr{A). (I 
am indebted to Professor William Gearhart for suggesting the above 
argument.) Let us summarize. 

Theorem 1.16 For any matrix A, we have rr(A) = cr(A). This 
number is called the rank of A, and denoted by rit(A). I 



Coordinate Matrices 

From the point of view of the vector space operations, every 
n-dimensional vector space is essentially the same. To understand this 
statement more clearly, we make the following definition. 

Definition Let rfmi(V) = n. An ordered basis for V is an ordered 
n-tuple (vj,...,Vj^) of vectors, for which the set {v^,...,Vj^} is a basis 
for V. D 

Thus, the only difference between a basis and an ordered basis is 
that we impose an order on the vectors in an ordered basis. 

Now, let us fix an ordered basis = (v^,..., for V. For 
each vector v G V, there is a unique ordered n-tuple (r^,...,rj^) of 
scalars for which 

This allows us to associate to each vector v G V a unique column 
matrix of length n as follows 
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(1.6) V ^ [V]g^ = 

The matrix [v]cg is known as the coordinate matrix of v with respect 
to the ordered basis Clearly, knowing the coordinate matrix [v]g^ 

is just as good as knowing v. (Assuming that we know ?B.) 

Furthermore, performing linear operations on coordinate matrices 
has essentially the same effect as performing the same operations on the 
vectors in V. That is, 

[u+v]gg= [u]gj+[v]^ 

and 

[rv]^ = r[v]^ 

or, more generally, 

(1.7) [rjVi + h rjjvjcg = ri[vi]g^ + h r^[vjg^ 

The association (1.6) defines a function 

<!><^{y) = [v]^ 

from V to (where we write the elements of F^ as column 

vectors). Because is a basis, it is easy to see that (j>(^ is bijective. 
Moreover, (1.7) is equivalent to 

0g^(riVi + • • • + r„vj = rj«ig^(vi) + • • • + 

which says that <j)(^ preserves the vector space operations. Functions 
from one vector space to another that preserve the vector space 
operations are called linear transformations and form the objects of 
study of the next chapter. 




EXERCISES 

1. Show that the sum of any collection of subspaces of V is a 

subspace of V, and that if S,T G il’(V), then lub{S,T) = S + T. 

2. Find a vector space V and a subset S of V that is a vector 

space, using operations that differ from those of V. 

3. Referring to Example 1.1, what are the subset relationships, if 
any, between Seq{C), Cq, and 

4. Prove that if S,T G if(V), then SUTGif(V) if and only if 
SCT or TCS. 

5. Let S, T and U be subspaces of V. Show that if U C S, then 

sn(T + u) = (snT) + u 
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This is called the modular law, for the lattice if(V). 

6. Show that the set Sym of all symmetric matrices of size n x n is 

a subspace of as is the set Skew Sym of all skew-symmetric 

matrices of size n x n. 

7. Prove the the first two statements in Theorem 1.7 are equivalent. 

8. Show that any subspace of a vector space is a direct summand. 

9. Let dmi(V) < oo, and suppose that V = U 0 S^ and V = U 0 S 2 . 
What can you say about the relationship between Sj and S 2 ? 

10. Show that if S is a subspace of a vector space V, then 
dim{S) < rfmi(V). Furthermore, if dim{S) = dim{Y) < 00 , then 
S = V. Give an example to show that the finiteness is required in 
the second statement. Hint: think about the vector space of 
polynomials F[x]. 

11. What is the relationship between S0T and T0S? Is the 
direct sum operation commutative? Formulate and prove a 
similar statement concerning associativity. Is there an ‘identity” 
for direct sum? What about ‘‘negatives”? 

12. Prove that the vector space 7 of all functions from IR to R is 
infinite dimensional. 

13. Prove that the vector space C of all continuous functions from R 
to R is infinite dimensional. 

14. Let F be a field, and let V be an infinite dimensional vector 
space over F. What is the cardinality of V? 

15. If dim{Y) = n does V necessarily contain a subspace of any 
dimension r satisfying 0 < r < n? 

16. Show that Theorem 1.2 does not hold if the base field F is finite. 

17. Let S be a subspace of V. The set v + S = {v + s | s E S} is 
called an affine subspace of V. 

a) Under what conditions is an affine subspace of V a subspace 
of V? 

b) Show that any two affine subspaces of the form v -f S and 
w + S are either equal or disjoint. 

18. If V and W are vector spaces over F for which | V | = | W | , 
then must it be true that dim(Y) = dim{W)? 

19. Let V be an n-dimensional vector space over a finite field F, 
with I F I = q. What is the cardinality of V? 

20. Let F be a field. A subfield of F is a subset K of F that is a 
field in its own right, using the same operations as defined on F. 

a) Show that F is a vector space over a subfield K of F. 

b) Suppose that F is an m-dimensional subspace over a subfield 
K of F. If V is an n-dimensional vector space over F, 
show that V is also a vector space over K. What is the 
dimension of V as a vector space over K? 
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Linear Transformations 

Loosely speaking, a linear transformation is a function from one 
vector space to another that preserves the vector space operations. Let 
us be more precise. 

Definition Let V and W be vector spaces over the same field F. A 
function r:V-^W is said to be a linear transformation if 

r(ru + sv) = rr(u) -f sr(v) 

for all scalars r,s G F and vectors u,v G V. A linear transformation 
r:V^V is called a linear operator on V. The set of all linear 
transformations from V to W is denoted by £(V,W), and the set of 
all linear operators on V is denoted by i^(V). D 

We should mention that some authors use the term linear 
operator for any linear transformation from V to W. 
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Definition The following terms are also employed: 

1) homomorphism for linear transformation 

2) endomorphism for linear operator 

3) monomorphism for injective linear transformation 

4) epimorphism for surjective linear transformation 

5) isomorphism for bijective linear transformation. D 



Example 2.1 

1) The derivative is a linear operator on the vector space of 

all infinitely differentiable functions on R, 

2) The integral operator r:F[x]— >F[x] defined by 



r(f) = 



c ^ 

f(t) dt 

0 



is a linear operator on F[x]. 

3) Let A bean mxn matrix over F. The function r^:F^— >F”^ 
defined by t^(v) = Av, where all vectors are written as column 
vectors, is a linear transformation from F^^ to F”^. D 



We next show that L(V,W) is a vector space in its own right. 
Moreover, the set i'(V) has the structure of an algebra^ as given by the 
following definition. 



Definition Let F be a field. An algebra over F is a nonempty 
set together with three operations, called addition (denoted by + ), 
multiplication (denoted by juxtaposition), and scalar multiplication 
(also denoted by juxtaposition), for which the following properties hold. 

1) A is a, vector space under addition and scalar multiplication. 

2) jI is a ring under addition and multiplication. 

3) If r G F and a,b G A then 

r(ab) = (ra)b = a(rb) D 

Thus, an algebra is a vector space in which we can take the 
product of vectors, or a ring in which we can multiply each element by 
a scalar (subject, of course, to additional requirements, as given in the 
definition). We now return to linear transformations. 

Theorem 2.1 

1) The set £(V,W) is a vector space, under ordinary addition of 
functions and scalar multiplication of functions by elements of F. 

2) If (7 G £(U,V) and r G £(V,W), then the composition ra is in 
Jl(U,W). 

3) If r G £(V,W) is bijective, then r ^ G L(W,V). 
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4) The vector space i^(V) is an algebra, where multiplication is 
composition of functions. The identity map t G i'(V) is the 
multiplicative identity, and the zero map 0 G -t(V) is the 
additive identity. 

Proof. We prove only part 3. Let r:V—^W be a bijective linear 
transformation. Then r“^:W— >V is a well-defined function, and since 
any two vectors and W 2 in W have the form and 

W 2 = '^(v 2 ), we have 

r"^(rwi +SW 2 ) = r"^(rr(vi) +sr(v2)) 

= r-^(r(rvi+sv2)) 

= rvi + SV2 

= rr~^(wi) + sr"l(w2) 
which shows that is linear. I 

One of the easiest ways to define a linear transformation is to give 
its values on a basis. The following theorem says that we may assign 
these values arbitrarily and thereby obtain a unique linear 
transformation. 

Theorem 2.2 Let be a bctsis for V, and let W be a vector space. 
Then we can define a linear transformation r G L(V,W) by specifying 
the values of r(b) G W arbitrarily, for all b G and extending the 
domain of r to all of V by linearity. Moreover, if r,<r G Jt(V,W) 
have the property that r(b) = <T(b) for all b G then t = cr. 

Proof. Once r is defined on the basis vectors in *36, we extend the 
definition of r by letting 

r(rjbi + • • • + r„b„) = ri^(bi) + ’ • ' + rn^(b„) 

The crucial point is that this is well-defined, since each vector in V 
has a unique representation as a linear combination of a finite number 
of vectors in ^B. We leave proof of the linearity of r, and the 
uniqueness, to the reader. I 

If r G Jt(V,W), and if S is a subspace of V, then we may 
restrict the domain of r to S. The resulting map, denoted by r | 5 , is 
a linear transformation from S to W, and is called the restriction of 
r to S. We will have many occasions to use this concept in the sequel. 
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The Kernel and Image of a Linear Transformation 

There are two very important vector spaces associated with a 
linear transformation r from V to W. 

Definition Let r E £(V,W). The set 

ker{T) = {v G V I r(v) = 0} 
is called the kernel of r, and the set 

im{r) = {r(v) | v G V} 

is called the image of r. The dimension of ker{T) is called the nullity 
of r, and is denoted by nw//(r). The dimension of im{r) is called the 
rank of r, and is denoted by rk{r), D 

It is routine to show that ier(r) is a subspace of V and im{r) 
is a subspace of W. Moreover, we have the following. 

Theorem 2.3 Let r G £(V,W). Then 

1) r is surjective if and only if im(r) == W 

2) r is injective if and only if ier(r) = {0} 

Proof. The first statement is merely a restatement of the definition of 
surjectivity. To see the validity of the second statement, observe that 

(2.1) r(u) = r(v) ^ r(u — v) = 0 ^ u — vGA:er(r) 

Hence, if ^er(r) = {0}, then 

r(n) = r(v) u = v 

which shows that r is injective. Conversely, if r is injective, then 
(2.1) implies that 

u~v = 0 u = Y ^ u — y£ker[r) 

and so, letting w = u- v, we get w = 0 if and only if w G iter(r), 
that is, ker{T) = {0}. I 



Isomorphisms 

Definition A bijective linear transformation r:V-^W is called an 
isomorphism from V to W. When an isomorphism from V to W 
exists, we say that V and W are isomorphic, and write V.» W. D 

Example 2.2 Let dim(y) = n. For any ordered basis of V, the 
map that sends each vector v to its coordinate matrix 
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[v]g^ is an isomorphism. Hence, any n-dimensional vector space over 
F is isomorphic to F^. We will refer to the map (j)c^ many times in 
the sequel. D 

When two vector spaces are isomorphic, they behave in essentially 
the same way with respect to the concepts that depend only on linear 
operations. The following result provides support for this view. 

Theorem 2.4 Suppose that r G £(V,W) is an isomorphism. Let S be 
a set of vectors in V, and let 

r(S) = {r(s) I 8 6 S} 

be the set of images of the vectors in S. Then 

1) S spans V if and only if r(S) spans W. 

2) S is linearly independent in V if and only if r(S) is linearly 
independent in W. 

3) S is a basis for V if and only if r(S) is a basis for W. I 

An isomorphism can be characterized as a linear transformation 
r:V— »W that maps a basis for V to a basis for W. 

Theorem 2.5 Let r G Jt(V,W). If is a basis for V and if 

r(g6) = {r(b) 1 b G 

is a basis for W, then r is an isomorphism from V onto W. I 

The following theorem says that, up to isomorphism, there is only 
one vector space of any given dimension. 

Theorem 2.6 Let V and W be vector spaces over F. Then V W 
if and only if dim{Y) = dim(W). I 

In Example 2.2, we saw that any n-dimensional vector space is 
isomorphic to F^. To examine the infinite dimensional counterpart of 
this result, let us observe that each n-tuple v = (a^,...,ajJ G F^ can be 
thought of as a function f:{l,. . .,n}— >F, where the value of f(i) is 
simply the ith coordinate a^ of v. Hence, we may think of F^ as 
the set F® of all functions from the set B = {!,..., n} to the set F. 

Now let us generalize this to include the infinite case. Suppose 
that B is any set (finite or infinite), called an index set. Recall that 
the vector space of all functions from B to the field F is denoted by 
F , and that the set of all functions in F that have finite support is 
denoted by (F®)q. 




50 



2 Linear Transformations 



Since any linear combination of functions with finite support also 
has finite support, (F®)q is a subspace of F®. We leave it to the 
reader to show that the functions G (F®)q defined for all b G B, by 

f 1 if X = b 

(o ifx^b 

form a basis for (F®)q? called the standard basis. Hence, 

dimi{F\)= |B|. 

Theorem 2.7 If n is a natural number, then any n-dimensional vector 
space over F is isomorphic to F^. If /c is any cardinal number, and 
if B is a set of cardinality /c, then any /c-dimensional vector space 
over F is isomorphic to the vector space (F®)q of all functions from 
B to F with finite support. I 

The Rank Plus Nullity Theorem 

Let r G L(V,W). We know that ier^r), being a subspace of V, 
has a complement A:er{r)^, that is, 

(2.2) V = ker{T) 0 ker{Ty 

Let 9G be a basis for A;er{r), and let C be a basis for ^er(r)^. Since 
3G n C = 0 and 3G U C is a basis for V, we have 

dim(V) = dim(ker{T)) -f dim(ker{Ty) 

Now, the restriction of r to ier(r)^, which we denote by 

T^:ker{Ty^im{T) 

is easily seen to be an isomorphism. In fact, is injective, since if 
V G ker{Ty (which is the domain of r^) and r^(v) = 0, then r(v) = 0, 
and so V G A:er(r)^ fl ker{T) = {0}, which implies that v = 0. 

To see that is surjective, suppose that r(v) G im{r). Then 
(2.2) implies that v = u + w, where u G ker{T) and w G A:er(r)^. 
Therefore, 

r(v) = r(u) + t(w) = t(w) = r^(w) 

which shows that r(v) G im(r^). Hence, im{r) C mi(r^), and since the 
reverse inclusion is obvious, we have im{r^) = im(r). Hence, is 
onto im{r). 

Thus, is an isomorphism, and 

ker{Ty « im{r) 
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From this, we deduce several useful facts, which are given in the 
following theorem. 

Theorem 2.8 Let r G L(V,W). Then we have the following: 

1) ker{Ty ^ im{r) 

2) (The rank plus nullity theorem) 

dim(ker{r)) -f dim(im{r)) = dim(V) 
or, in other notation, 

rk{r) 4- nuU{r) = dim(V) 

3) If S is a subspace of V, then all complements of S are 
isomorphic. 

4) If S is a subspace of V, then 

dim(S) + dim(S^) = dim{y) 

Proof. Part (1) has been proven. Part (2) follows from the fact that 
because 3G and C are disjoint and | C | = | r(C) | = rf 2 m(nn(r)), 

dimiy) =:|3GUC|=:|3G| + |C|= dim{ker{T)) + dim{im{T)) 

To prove parts (3) and (4), let S be a subspace of V, and let T be a 
particular complement of S. Thus V = S0T. Define a map />:V-^T 
as follows. Any v G V has the form v = u + w, where u G S and 
w G T. Let p{y) — w. We leave it to the reader to show that 
p G i-(V), and that ker{p) — S and im{p) = T. Now, part (1), applied 
to the map />, says that T, for any complement of S. 

Hence, any two complements of S are isomorphic to T, and therefore 
to each other. Part (4) follows from part (2), applied to the map p. I 

Theorem 2.8 has an important corollary. 

Corollary 2.9 Let r G L(V,W), where dim{y) = dimiyi) < oo. Then r 
is injective if and only if it is surjective. I 

Linear Transformations from F” to F™ 

For any m x n matrix A over F, we can define the map 
“multiplication by A” 

Ta(v) = Av 

where v G F^ is written in column form. Note that is a function 
from F^^ to F”^. It is easy to see that is a linear transformation. 
As it happens, all linear transformations r G L(F^,F”^) have the 
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form for some m x n matrix A. To see this, recall that r is 
uniquely defined by its values on a basis for F^, in particular, by its 
values on the standard basis 

r(ei),...,r(ej 

Now, if A is an mxn matrix, then 

Ae^ = ith column of A 

Hence, if we let A be the mxn matrix with columns r(e^),. . .,r(ej^), 
then 

Ae; = r(ei) 

and so r and agree on the standard basis vectors, which implies 
that r = Let us summarize. 



Theorem 2.10 

1) Let A be an mxn matrix over F. The map defined by 
r^(v) = Av is in Jt(F^,F”^). 

2) Conversely, let r G £(F*^,F^). Then there exists a unique mxn 
matrix A over F for which r = r^. This matrix is called the 
standard matrix for r. The ith column of A is r(e|). I 



Example 2.3 Consider the linear transformation r:F^~^F^ defined by 
r(x,y,z) = (x - 2y,z,x + y -f z) 

Then we have, in column form 



X 




’ x-2y 




1 -2 0 


X 


y 

z 


— 


z 

x+y+z 


— 


0 0 1 
1 1 1 


y 

z 



and so T = r^, where 



A = 



1 -2 0 

0 0 1 

1 1 1 



D 



Since the image of is the column space of A, we have 
dim{ker{TpJ) + rk{A) — dim{Y) 

This gives the following useful result. 

Theorem 2.11 Let A bean mxn matrix over F. 

1) is injective if and only if rk{A) = n. 

2) is surjective if and only if rA;(A) = m. I 
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Change of Basis Matrices 

Let V be a vector space, and suppose that = (b^,. . and 
C = (c^, • • • ? are ordered bases for V. It is a natural question to ask 
how the coordinate matrices [v]g^ and [v]g are related. Figure 2.1 
illustrates the situation. 



F" 



V 










F 



m 



Figure 2.1 



Our interest lies in finding an expression for the linear transformation 
6 = since 

is the map that describes the relationship between [v]<^ and [v]g. 

Since 9 G we know that 6 = for some m x n 

matrix A, and furthermore 

ith column of A = Ae^ = 7*^(e|) = ^(e^) = ^([bjg^) = [bjg 

Thus, A is just the matrix whose ith column is the coordinate matrix 
We of the ith “old” basis vector with respect to the “new” basis C. 
This matrix is called the change of basis matrix from S to C, which 
we will denote by 

Theorem 2.12 Let S = (b^,...,bjJ and C be ordered bases for a 
vector space V. Then 

Me 

where the change of basis matrix Mg^ ^ is the matrix whose ith column 
is [bj]g. I 

It is worth remarking that given any invertible matrix A and 
any ordered basis for V, we can find an ordered basis C for which 
A = Mg^ g. To see this, we look again at Figure 2.1, which shows that 

But (^g(v) = Cj if and only if v = Cj is the ith basis vector of C, and 
so we seek to solve the equation 
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A[v]gg = e; 

for V = Cj. This is done simply by multiplying both sides by A~^, to 
get 

[v]^ = A~^ej = ith column of A ^ 

Hence, c- is the vector whose coordinate matrix with respect to % is 
the ith column of 

The Matrix of a Linear Transformation 

Figure 2.2 shows a linear transformation r:V-^W, along with the 
pair of linear transformations (f>(^ and used to represent vectors in 
V and W in terms of coordinate matrices. 




•> 



Figure 2.2 

Once the ordered bases S = and C = (cj,. . .,c^) have 

been fixed, the linear transformation 6 = uniquely 

determines r, since it determines r(v) for all v 6 V, by means of 

Moreover, since we know that 6 = r^, for some m x n 

matrix A. In fact, 

ith column of A = Ae^ = r^(e|) = 6{e-^ = ^([bj]^) = [r(b|)]g 
This gives the following result. 

Theorem 2.13 Let r E L(V,W), and let S = (b|,...,bj^) and C = 
(ci,...,Cj^) be ordered bases for V and W, respectively. Then, 
referring to Figure 2.2, r can be represented by a linear 
transformation E £(F^,F”^), that is 

[^(v)]e = 

where A = [r]^ « is the matrix whose ith column is [i'(b|)]g. 

We call g the matrix of r with respect to the bases % 
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and C. Thus, 

[r(v)](o = [r]^ [v]^ 

When V = W and *36 = C, we denote the matrix [r]^ of the linear 
operator r G «f'(V) with respect to the basis by Thus, 

[^(v)]gg = Wgg [v]gg I 



Example 2.4 Let derivative operator, defined on the 

vector space of all polynomials of degree at most 2. Let = C = 
(l,x,x^). Then 



[D(l)]c = [0]c = 



0 




* 1 * 


0 


II 

II 


0 


0 


0 



and so 



[D(x2)]g = [2x]g = 



0 

2 

0 



[D]^ = 



0 1 0 
0 0 2 
0 0 0 



Hence, for example, if p(x) = 5 -|- x + 2x^, then 



[Dp(x)](0 = [D]g^ [P(x)]g^ = 



1 

o 

o 
1 




5 




1 


0 0 2 




1 


zr 


4 


1 

o 

o 

o 

1 




2 




0 



and so Dp(x) =: 1 + 4x. D 

The following result shows that we may work equally well with 
linear transformations or with the matrices that represent them (with 
respect to fixed ordered bases *36 and C). This applies not only to 
addition and scalar multiplication, but also to multiplication. 

Theorem 2.14 Let V and W be vector spaces over F, with ordered 
bases = (bj, . . . , and C = (c^, . . . , respectively. 

1) The map 

,j>iT) = [r]^ 

is an isomorphism. Thus 

£(V,W)« An.n(F) 
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2) Furthermore, if cr:U-+V and r:V— »W, and if “tB, C and 1 are 
ordered bases for U, V and W, respectively, then 

In loose terms, the matrix of the product (composition) ra is the 
product of the matrices of r and cr. 

Proof. To see that <l> is linear, observe that the ith column of the 
matrix g is 

[(Sff + tr)(bj)]g = [s(7(bi) + tr(bi)]g = s[«r(bi)]g + t[r(b;)](o 

and so 

(f){s(T + tr) = [s(T + g + I'Mgj C ~ 

The map <j> is surjective, for if A is an m x n matrix, we simply 
define r:V-^W by the condition 

that is, 

T = 

Then <f>{r) = A. Finally, (j) is injective, since 

Ms C ~ ® ^ ® ^ ^ ~ ® for all i => r = 0 

Thus, <l> is an isomorphism. 

To prove part (2), observe that 

[<r(v)]g = Hgg (O [v]og and [r(w)]^ = [r]^ [w]^ 

Therefore, 

Mc,s Ws,c Mgs = Me,^ 

= [^(o-(v))]<j 

= [v]^ 

from which part (2) follows. I 



Change of Bases for Linear Transformations 

Since the matrix [r]g^ ^ depends on the ordered bases S and C, 
it is natural to wonder how’ to choose these bases in order to make this 
matrix as simple as possible. For instance, can we always choose the 
bases so that r is represented by a diagonal matrix? 

As we will see in Chapter 7, the answer to this question is no. In 
that chapter, we will take up the general question of how best to 
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represent a linear operator by a matrix. For now, let us take the first 
step and describe the relationship between the matrices of r with 
respect to two different pairs (‘S,C) and of ordered bases. 

Figure 2.3 describes the situation. 




Figure 2.3 



As we can see from this figure, r can be written in two ways 
Equating these two expressions gives 
or 

or, in matrix terms 

M'3&',C' “ ^C,C' M'3B,C ^'36', '36 

This gives the following. 

Theorem 2.15 Let r G £(V,W), and let (S,C) and (?B',C') be pairs 
of ordered bases of V and W, respectively. Then the matrix of r 
with respect to the ordered bases (S',C') can be expressed in terms of 
the matrix of r with respect to the ordered bases (^B,C) as follows 

(2.3) = ^C,C' M'iB.C ’ 

When r G i'(V) is a linear operator on V, it is customary to 
represent r by matrices of the form where the ordered bases 

used to represent vectors in the domain ana image are the same. We 
leave it to the reader to show that Mct. cft/ is invertible and that 
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Hence, when S = C, Theorem 2.15 takes the following important form. 

Theorem 2.16 Let r G i'(V), and let and be ordered bases 
for V. Then the matrix of r with respect to can be expressed in 
terms of the matrix of r with respect to ?B as follows 

(2.4) [r]g^, = [r]gj (Mgg g^,)”^ ■ 

Equivalence of Matrices 

Since change of basis matrices are invertible, (2.3) has the form 

[r]gg, (O, = PMgj (oQ~^ 

where P and Q are invertible matrices. This leads to the following 
definition. 

Definition Two matrices A and B are equivalent if there exist 
invertible matrices P and Q for which 

B = PAQ-^ D 

We remarked in Chapter 0 that B is equivalent to A if and 
only if B can be obtained from A by a series of elementary row and 
column operations. Performing the row operations is equivalent to 
multiplying the matrix A on the left by P, and performing the 
column operations is equivalent to multiplying A on the right by 
Q-\ 

In terms of (2.3), we see that performing row operations (pre- 
multiplying by P) is equivalent to changing the basis used to represent 
vectors in the image, and performing column operations (post- 
multiplying by Q”^) is equivalent to changing the basis used to 
represent vectors in the domain. 

According to Theorem 2.15, if A and B are matrices that 
represent r with respect to possibly different ordered bases, then A 
and B are equivalent. The converse of this also holds. 

Theorem 2.17 The following statements are equivalent for matrices A 
and B. 

1) If A represents a linear transformation r:V-^W, with respect to 
ordered bases *35 and C, then B also represents r, but perhaps 
with respect to different ordered bases. That is, if 
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then there exist ordered bases and C' for which 

B = (O, 

2) A and B are equivalent. 

Proof. Suppose that (1) holds. The matrix A represents the linear 
transformation with respect to the standard bases. Hence, B 

must also represent but with respect to possibly different ordered 
bases, which means that (2.3) holds, and so A and B are equivalent. 
Conversely, suppose that A and B are equivalent, and so 

B = PAQ-^ 

where P and Q are invertible. Let r G JL(V,W), let *36 and C be 
ordered bases for V and W, respectively, and suppose that 

A = [r]gg (o 

We have seen earlier (after Theorem 2.12) that there exist ordered bases 
and C' for which P = and Q“^ = Mc^, <^. Hence, 

® “ ^C,C' ^'36', '36 

But then, according to Theorem 2.15, B = g/. Hence (1) holds. I 



Similarity of Matrices 

As we mentioned earlier, when r G L(V) is a linear operator on 
V, it is customary to represent r by matrices of the form where 

the ordered bases used to represent vectors in the domain and image are 
the same. In this case, (2.4) has the form 

W^. = p Wa, p-‘ 

where P is an invertible matrix. This prompts the following 
definition. 

Definition Two matrices A and B are similar if there exists an 
invertible matrix P for which 

B = PAP-^ 

The equivalence classes associated with similarity are called similarity 
classes. D 

The analog of Theorem 2.17 in this case is the following. 

Theorem 2.18 The following statements are equivalent for matrices A 
and B. 
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1) If A represents a linear operator r:V-^V with respect to an 
ordered basis then B also represents r, but perhaps with 
respect to a different ordered basis. That is, if 

A = [t]^ 

then there exists an ordered basis for which 

B — Hgj' 

2) A and B are similar. I 

Theorem 2.18 can be paraphrased by saying that two matrices 
represent the same linear operators on V if and only if they are 
similar. We will devote much effort in Chapter 7 to finding a canonical 
form for similarity. 



Invariant Subspaces and Reducing Pairs 

Let r be a linear operator on V. If S is a subspace of V, there 
is no guarantee that, for a given s G S, the vector r(s) will also be in 
S. This prompts us to make the following definition. 

Definition Let r be a linear operator on V. A subspace S of V is 
said to be invariant under r if r(S) C S, that is, if r(s) G S for all 
s G S. Put another way, S is invariant under r if the restriction 
r I g, which a priori, maps S to V, is actually a linear operator 
on S. D 



If S 
to S, then 



is a subspace of V and if 

V = S0S^ 



is a complementary subspace 



However, this does not imply that is also invariant under r. (The 
reader may wish to supply a simple example with V = R^.) This leads 
us to make the following definition. 



Definition Let r be a linear operator on V. If V = S 0 T and if 
both S and T are invariant under r, we say that the pair (S,T) 
reduces r. Put another way, (S,T) reduces r if the restrictions r | g 
and r | ^ are linear operators on S and T, respectively. D 

Definition Let p be a linear operator on V. Then we write p = 
cr 0 r, and call p the direct sum of a and r, if there exist subspaces 
S and T of V for which (S,T) reduces p and 

(T = p I g and T — p\n^ 



D 
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The concept of the direct sum of linear operators will play a key 
role in the study of the structure of a linear operator. For, in a sense 
we will make precise later, if p (t 0 r, then we have a decomposition 
of p into simpler linear operators a and r. 



EXERCISES 

1. Can you think of any other examples of algebras besides X(V)? 

2. Prove Corollary 2.9, and find an example to show that it does not 
hold without the finiteness condition. 

3. Let r E £(V,W). Prove that if is a basis for V and if 

r(^B) = {'r(b) | b G is a basis for W, then r is an isomorphism 

from V onto W. 

4. Let V and W be vector spaces over F. Show that V « W if 
and only if dim(V) = dim(W). 

5. Let T G L(V,W). Prove that r is injective if and only if 
whenever v^,...,Vj^ are linearly independent in V, then 
rv^,...,rVj^ are linearly independent in W. 

6. Let r G L(V,W). Prove that r is an isomorphism if and only if 
it carries a basis for V to a basis for W. 

7. If r G L(V 2 ,Wj) and a G £(V 2 ,W 2 ), we define the external 
direct sum r ffl cr G L(Vj 0 V 2 , Wj 0 W 2 ) by 

(rffl<r)((vpV2)) = (r(vj),cr(v2)) 

Show that r ffl (T is a linear transformation. 

8. Prove that the kernel and image of a linear transformation 
r:V— >W are subspaces of V and W, respectively. 

9. Let V = S0T. Prove that S 0 T « S fflT, where ffl stands for 
the external direct sum. Thus, up to isomorphism, internal and 
external direct sums are the same. 

10. Let A be an m x n matrix. Show that 2 m(r^) is the column 
space of A. Show that rk{rj^) = rk(A). 

11. Let r G «L(V), where dim(y) < 00 . If rk(r^) = rk{r) show that 
im(r) n ker{T) = {0}. 

12. Let tG£(U,V) and (t G L(V,W). Show that 

rk{ra) < min{ri’(r), rk{a)} 

13. Let tGL(U,V) and (t G L(V,W). Show that 

nuU{ra) < null{r) 0 nuU(cr) 

14. Let r,(T G i'(V), where r is nonsingular. Show that rk{ra) = 
rk{ar) = rk(cr). 

15. Let r,<T G £(V,W). Show that rk{r + <r) < rk{r) 0 rk{(r), 

16. Let S be a subspace of V. Show that there is a r G L(V) for 
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which ker{T) = W. Show also that there exists a, a £ ^(V) for 
which im(cr) = W. 

17. Prove that any change of basis matrix g is nonsingular and 
that 

18. Describe the counterpart of Theorem 2.11 for a linear 
transformation r:V— »W. 

19. Let V = Sj 0 82 - Define linear operators on V by 

Pli^l “I-S 2 ) = S|, for i = 1, 2. These are referred to as projection 
operators. Show that 

1 ) pi=pi 

2) pj + />2 = I? where I is the identity map on V. 

3 ) p|/>j = 0 , for i^j, where 0 is the zero map. 

4) V = irn(p^) 0 im{p<^ 

20. Suppose that T G i(V) has the property that T^ = T oT = 0. 
Show that 2rk{T) < dim{Y). 

21. Let A be an mxn matrix over F. What is the relationship 
between the linear transformation r^:F^— and the system of 
equations AX = B? Use your knowledge of linear transformations 
to state and prove various results concerning the system AX = B, 
especially when B = 0. 

22. Draw a figure similar in spirit to Figure 2.3 to show the situation 

where a single matrix M represents two different linear 
transformations and T 2 :V— ^W. What is the 

connection between and T 2 ? 

23. Find an example of a vector space V, and a proper subspace S 
of V, for which V « S. 

24. Let dim(y) < 00 . If r, a £ i^(V), prove that ar = l implies that 
r and a are invertible, and that a = p(r) for some polynomial 
p(x) G F[x]. 

25. Let T G i-(V), where dim(V) < oo. If rcr = <tt for all a G -L(V), 
show that r = rt, for some r G F. (i is the identity map.) 




CHAPTER 3 

The Isomorphism Theorems 



Contents: Quotient Spaces, The First Isomorphism Theorem, The 

Dimension of a Quotient Space, Additional Isomorphism Theorems, 
Linear Functionals, Dual Bases, Reflexivity, Annihilators, Operator 
Adjoints, Exercises, 



Quotient Spaces 

Let S be a subspace of a vector space V, and let = g be the 
binary relation on V defined by 

U = sV U-VGS 

It is easy to see that = g is an equivalence relation. When u = gV, we 
say that u and v are congruent modulo S. The term mod is used as 
a colloquialism for modulo, and u = gV is often written 

u = V mod S 

When the subspace in question is clear, we will simply write u = v. 

To see what the equivalence classes look like, observe that 

[v] = {u G V I u = v} 

= {uGV|u-vGS} 

= {u G V I u = V 4- s for some s G S} 

= {v + s|sG S} 

= v+S 
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The set !f = v+S = {v + s | s G S} is called a coset of S in V. 

Thus, the equivalence classes for congruence mod S are the cosets 
v+S of S in V. The set of all cosets is denoted by 

V = {v+S|v€V} 

This is read “V mod S” and is called the quotient space of V modulo 
S. Of course, the term space is a hint that we intend to define vector 
space operations on V/S. 

Before doing so, however, observe that Theorem 0.8 implies 

(3.1) u+S = v+S ^ V G u+S ^ u G v+S 
and 

u,v G V => u+S = v+S or (u+S) fl (v+S) = 0 

Thus, a coset may be written in the form v+S for many different 
vectors v. In fact, (3.1) implies that 

u+S = v+S u - V G S 

When a coset if is written v+S, the vector v is called the coset 
representative for if. Clearly, any vector in the coset can be a coset 
representative. 

Observe also that 

u = v => u-vGS 

r(u — v) G S for all r G F 
=> ru — rv G S for all r G F 
=> ru = rv 

and so 

(3.2) u = V ru = rv for all r G F 
In addition, 

u. = Vi and U 9 = V 9 u. — v. G S and U 9 — V 9 G S 

(«l - Vi) + K - V2) G S 

(Ui+U2)-(Vi+V2)€S 

(“i + U2) = K + V2) 

and so 

(3.3) Uj = vj and U 2 = V 2 (u^ + U 2 ) = (vj + V 2 ) 

Properties (3.2) and (3.3) imply that congruence mod S preserves the 
vector space operations on V. 

A natural choice for vector space operations on V/S is 

(u+S) + (v+S) = (u + v)+S 



and 




3 The Isomorphism Theorems 



65 



r(u+S) = ru+S 

However, a coset generally has many different coset representatives, and 
these definitions seem to depend on which representative is chosen. In 
order to show that they are well-defined, it is necessary to show that 
they do not depend on the choice of coset representatives, that is, 

u^+S = U 2 +S and v^-f-S = ¥ 2 + S (uj -f Vj)+S = (u 2 -f V 2 )+S 

and 

Uj+S = U2-fS => = r(u2-fS) 

The straightforward details of this are left to the reader. Let us 
summarize. 

Theorem 3.1 Let S be a subspace of V. The binary relation 

u = V ^ u~ V G S 

is an equivalence relation on V, whose equivalence classes are the cosets 
i = v+S = {v -h s I s G S} 

of S in V. The set V/S of all cosets of S in V, called the quotient 
space of V modulo S, is a vector space under the well-defined 
operations 

(u-f-S) -h (v-f S) = (u -h v)H-S and r(u-f S) = ru-f-S 
The zero vector in V/S is the coset 0 -f S = S. I 

Let S be a subspace of V, and define a map TTgrV—^V/S by 

= V+S 

for all V G S. This map is called the canonical projection, or natural 
projection, of V onto S, or simply projection modulo S. It is easily 
seen to be linear, for we have (writing tt for 7 Tg) 

7r(ru -h sv) = (ru -h sv)+S = r(u-i-S) + s(v+S) = r7r(u) -f S7r(v) 

The canonical projection is surjective, since v+S = 7t(v) for any coset 
v+S. To determine the kernel of tt, note that 

V G ker{7r) 7t(v) = 0 w + S = S 44^ vGS 

and so 

ker^Tr) = S 

Theorem 3.2 The canonical projection 7Tg:V--^V/S defined by 

’Ts(v) = v+S 

is a surjective linear transformation, with A:e?^7rg) = S. ■ 
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The First Isomorphism Theorem 

Let S be a subspace of V. Figure 3.1 shows a linear 
transformation r G L(V,W), along with the canonical projection TTg 
from V to the quotient space V/S. 




Vs 



Figure 3.1 

This figure suggests the existence of a map r' from the quotient space 
V/S to W with the property that 

(3.4) r' o TTg = r 

that is, 

r(v) = (r' o 7rg)(v) = r'(v+S) 

So let US define a /wnc/eon t' from V/S to W by 

r'(v + S) = r(v) 

This function is well-defined if and only if 

v + S = u + S => r'(v -f- S) = r'(u + S) 

or, equivalently, 

v-hS = u-|-S => r(v) = r(u) 

But this is equivalent to 

V — u G S => r(v — u) = 0 
or, replacing v — u by x, 

X G S r(x) = 0 

Thus, T* is well-defined if and only if S C A;er(r). 

Let us suppose that S C ker^r), and hence that r' is well- 
defined. Then r':V/S— is a linear transformation, with image 

em(r') = {r'(v-i-S) | v+S G V/S} = {^(v) | v G V} = im(r) 

and kernel 

ker{T') = {v-f S | r'(v -f S) = 0} 

= {v+S I r(v) = 0} 
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= {v+S I V G ker{T)} 

Also, r' is unique in the sense that there is only one map r':V/S— 
with the property that r' o TTg = r. 

Theorem 3.3 Let r G £(V,W) and let S C ker^r) be a subspace of V. 
Then, as pictured in Figure 3.1, there is a unique linear transformation 
r':V/S— with the property that 

r' o TTg = r 

Moreover, A:er(r') = {v+S | v G ker{T)} and nn(r') = im(r), I 

The situation illustrated in Figure 3.1 is often described by saying 
that any linear transformation r:V^W can be factored through the 
projection map 7Tg for S C ker{T). 

Theorem 3.3 has a very important corollary, which is often called 
the first isomorphism theorem^ and is obtained by taking S = ker{r). 

Theorem 3.4 (The first isomorphism theorem) Let r:V— be a 
linear transformation. Then the linear transformation r':V/Jber(r)— »W 
defined by 

r'(v+A;er(r)) = r(v) 

is injective, and so 

iR?) ' 

According to Theorem 3.4, the image of any linear transformation 
with domain V is isomorphic to a quotient space of V. Thus, by 
identifying isomorphic spaces as being essentially the same, we can say 
that the images of linear transformations on V are just the quotient 
spaces of V. Conversely, any quotient space V/S of V is the image 
of a linear transformation on V, in particular, V/S is the image of the 
surjective canonical projection map 7rg:V-^V/S. Thus, up to 
isomorphism, images of linear transformations on V are the same as 
quotient spaces of V. 



The Dimension of a Quotient Space 

The first isomorphism theorem gives further insight into quotient 
spaces. Recall that any subspace S of V has a complement S^, for 
which 

V = S 0 

Since every vector v G V has the form v = s + s^, for unique vectors 
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s G S and G we can define a linear operator p:V-^Y by setting 

p{s + s^) = s'" 

Because s and are unique, p is well-defined. It is called 

projection onto S^. (Note the word onio^ rather than modulo.) It is 
clear that 

im{p) = 

and 

ker{p) = {s + G V I = 0} = S 
Hence, the first isomorphism theorem implies that 



V 



s 






In other words, we have the following. 



Theorem 3.5 Let S be a subspace of V. Then any complement of S 
in V is isomorphic to the quotient space V/S. I 

Corollary 3.6 Let S be a subspace of a vector space V. Then 

rfnn(V) = dim(S) + dim(V /S) D 

The dimension of the quotient space V/S is often called the 
codimension of S in V. 



Additional Isomorphism Theorems 

There are several other isomorphism theorems that are 
consequences of the first isomorphism theorem. 

Theorem 3.7 (The second isomorphism theorem) Let V be a vector 
space, and let S and T be subspaces of V. Then 

S + T ^ S 
T ^snT 

Proof. Let r:(S + T)-^S/(S flT) be defined by 

r(s -f t) = s + (S n T) 

We leave it to the reader to show that r is a well-defined surjective 
linear transformation, with kernel T. An application of the first 
isomorphism theorem then completes the proof. I 

Theorem 3.8 (The third isomorphism theorem) Let V be a vector 
space, and suppose that S C T C V are subspaces of V. Then 
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v/s^v 

T/S~T 

Proof. Let r:V/S-^V/T be defined by r(v-f S) = v-f-T. We leave it 
to the reader to show that r is a well-defined surjective linear 
transformation whose kernel is T/S. The rest follows from the first 
isomorphism theorem. I 

Theorem 3.9 Let V be a vector space, and let S be a subspace of V. 
Suppose that V = 0 V2 and S = 0 82- Then 

V VieV2 ,.Vi^V2 
S ~ Si © S2 ~ Sj S2 

(Recall that EB stands for the external direct sum.) 

Proof. Let r:V^(Vj/S^)ffl(V2/S2) be defined by 

7-(Vi+V2) = (Vi+Si,V2+S2) 

This map is well-defined, since the sum V = V20V2 is direct. We 
leave it to the reader to show that r is a surjective linear 
transformation, whose kernel is Sj 0 82* The rest follows from the first 
isomorphism theorem. I 



Linear Functionals 

Linear transformations from V to the base field F (thought of 
as a vector space over itself) are extremely important. 

Definition Let V be a vector space over F. A linear transformation 
f G L(V,F), whose values lie in the base field F is called a linear 
functional (or simply functional) on V. The set of all linear functionals 
on V is denoted by V* and is called the algebraic dual space of V. D 

The adjective algebraic is needed here, since there is another type 
of dual space that is defined on normed vector spaces, where continuity 
of linear transformations makes sense. We will discuss the continuous 
dual space briefly in Chapter 13. 

To help distinguish linear functionals from other types of linear 
transformations, we will usually denote linear functionals by lower-case 
Roman letters, such as f, g, and h. 

Note that, according to Theorem 2.1, the dual space V* is a 
vector space. 

Example 3.1 The map f:F[x]-^F, defined by f(p(x)) = p(0), is a linear 
functional, known as evaluation at 0. D 
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Example 3.2 Let C[a,b] denote the vector space of all continuous 
functions on [a,b] C R. Let f:C[a,b]— be defined by 



Then f G C[a,b]*. D 



f(a(x)) = 



pb 

a(x) dx 

a 



According to Theorem 2.8, for any f G V*, 

dim{ker{{)) + dim{im(f)) = dim(V) 

But, since nn(f) C F, we have either im(f) = {0}, in which case f is 
the zero linear functional, or mi(f) = F, in which case f is surjective. 
In other words, a nonzero linear functional is surjective. Moreover, if 
rfim(V) < oo, then 

dim(ker{{)) = dim{Y) — 1 

Thus, in loose terms, the kernel of a linear functional is a relatively 
“large” subspace of the domain V. Even if V is infinite dimensional, 
we can say that ker{i) has codimension 0 or 1 . 

The following theorem will prove very useful. 



Theorem 3.10 

1) For any nonzero vector v G V, there exists a linear functional 
f G V* for which f(v) 9 *^: 0. 

2) A vector v G V is zero if and only if f(v) = 0 for all f G V*. I 



Dual Bases 

Suppose that V is finite dimensional, and let ^ = {v^,...,Vj^} 
be a basis for V. For each 1 < i < n, we can define a linear functional 
1^1 E V*, by the orthogonality condition 

= forj = l,...,n 

where 5—, known as the Kronecker delta function, is defined by 




Theorem 3.11 Let be a basis for V. Then the linear 

functionals , . . . , defined by 

= forj = l,...,n 

form a basis for the dual space V*. This basis S* = { 1 /^,...,!/^^} is 
called the dual basis for 
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Proof. If 



0 



= + 



’ T U 
' n n 



where 0 represents the zero linear functional, then we may apply both 
sides of this to the basis vector v^, to get 



0 = 0(vi) = 



J J 

and so r^ = 0, for all i. Hence, is linearly independent. 

Any f G V* is uniquely determined by its values on the basis 
vectors v-, say 



f(vi) = ai 



If we let g be the linear functional 



then 

g(vj) = aj = f(vj) 

and so f = g G 5pan{z/j, . . . , which proves that ‘5B* spans V*. 
Hence, is a basis for V*. I 



Corollary 3.12 If dim{Y) < cxd, then dim{Y*) = rfn/i(V). I 



The next example shows that Corollary 3.12 does not hold 
without the finiteness condition. 



Example 3.3 Let V be an infinite dimensional vector space over the 
field F = Z 2 = {0,1}, with basis ‘SB. Since the only coefficients in F 
are 0 and 1, a finite linear combination over F is just a finite sum. 
Hence, V is the set of all finite sums of vectors in and so according 
to Theorem 0.11, 

I V I = I I = I 1 

(The finite sums in ^ are in one-to-one correspondence with the finite 
subsets of ?B.) 

On the other hand, each linear functional f G V* is uniquely 
defined by specifying its values on the basis ?B. Since these values 
must be either 0 or 1, specifying a linear functional is equivalent to 
specifying the subset of on which f takes the value 1. In other 
words, there is a one-to-one correspondence between linear functionals 
on V and all subsets of *5. Hence, 

I I = I I > I I = I V I 

This shows that V* cannot be isomorphic to V, nor to any proper 
subset of V. Hence, dim(y^) > dim(V), D 
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Reflexivity 

If V is a vector space, then so is the dual space V*. Hence, we 
may form the double dual space V**, which consists of all linear 
functionals E:V*— >F. In other words, an element E of V** is a 
linear map that assigns a scalar to each linear functional on V. 

With this firmly in mind, there is one rather obvious way to 
obtain an element of v**. Namely, if v 6 V, consider the map 
v:V*— defined by 

v(f)=f(v) 

which sends the linear functional f to the scalar f(v). For obvious 
reasons, this map is called evaluation at v. To see that v is in V**, 
we must show that it is linear. But if f,g G V*, then 

v(rf -i- sg) = (rf + sg)(v) = rf(v) + sg(v) = rv(f) + sv(g) 

and so v is indeed linear. 

Since evaluation at v is in V** for all v G V, we can define a 
map r:V-^V** by 

r(v) = V 

This is called the canonical map (or the natural map) from V to V**. 
It is injective and, in the finite dimensional case, it is also surjective. 

Theorem 3.13 The canonical map r:V^V** defined by letting r(v) 
be evaluation at v, is a monomorphism. Furthermore, if V is finite 
dimensional, then r is. an isomorphism. 

Proof. To see that r is linear, we observe that 

r(ru + sv) = ru-hsv 

is evaluation at ru + sv. But 

ru + sv(f) = f(ru + sv) = rf(u) + sf(v) = (ru + sv)(f) 

for all f G V*, and so 

r(ru + sv) = ru + sv = ru + sv = rr(u) + sr(v) 

which shows that r is linear. 

To determine the kernel of r, we observe that 

r(v) = 0 => V = 0 

zi> v(f) = 0 for all f G V* 

=> f(v) = 0 for all f G V* 

=> V = 0 

by Theorem 3.10, and so ier(r) = {0}, that is, r is injective. 
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In the finite dimensional case, we have dim(V**) — dim(y*) = 
rfnn(V), and so r is also surjective. Thus, r is an isomorphism. I 

Note that if dim{Y) < oo, then since the dimensions of V and 
V** are the same, we deduce immediately that V « y**. This is not 
the point of Theorem 3.13. The point is that the natural map v— »v is 
an isomorphism. Because of this, V is said to be algebraically 
reflexive. Thus, Theorem 3.13 implies that all finite dimensional vector 
spaces are algebraically reflexive. 

If V is finite dimensional, it is customary to identify the double 
dual space V** with V, and to think of the elements of V** simply 
as vectors in V. 

Let us consider an example of a vector space that is not 
algebraically reflexive. 

Example 3.4 Let V be a vector space over F = {0,1}, with a 
countably infinite ordered basis = (bj,b 2 , . . .). Then any vector 

V G V can be identified with its coordinate sequence 

v = (apa2,...) 

where a^ G {0,1} and only a finite number of the a^ are equal to 1. 

On the other hand, any f G V* is uniquely determined by its 
values on the vectors in and since these values can be arbitrarily 
chosen, f can be identified with a binary sequence 

with no restriction on the number of Is in the sequence. 

Now, we define the support of a binary sequence x = (x^,X 2 ,...), 
denoted by supp(x), to be the set of coordinate positions where x- = 1. 
Thus, a vector in V is a binary sequence with finite support, whereas a 
linear functional on V is any binary sequence. 

A moments reflection on the representation of f will reveal that 

v(f) = f(v) = I supp{y) n supp{f) I 

We can show that the canonical map r:v— »v is not surjective by 
finding a linear functional (j) G V** that does not have the form v, for 
any v G V. To this end, define linear functionals ej^ G V* by 

ek = (0,.. .,0,1,0,...) 

where the 1 appears in the kth position. Then supp{e^) = {k}, and so 
v(ej^) = I supp{y) n supp{e^) | = | supp{y) fl {k} | 

Hence, for any v G V, the map v has the property that 




74 



3 The Isomorphism Theorems 



(3.5) k > max{5wpp(v)} => v(ej^) = 0 

Now, since the linear functionals are linearly independent, we 
may extend the set {cj^} to a basis ^ for V*. Let us define a map 
(j) G V** by setting = 1 for all k, and then extending <j) to all 

vectors in the basis arbitrarily. (In other words, we don’t care how 
(j) is defined on the other elements of S.) Then (j) defines a linear 
functional in V**, with the property that 

= 1 

for all k. But this shows, in conjunction with (3.5), that <f> cannot 
have the form v, for any v G V. Hence, the canonical map is not 
surjective, and V is not algebraically reflexive. D 



Annihilators 

If f G V*, then f is defined on vectors in V. However, we may 
also define f on subsets M of V by letting 

f(M) = {f(v) I V G M} 



Definition Let M be a nonempty subset of a vector space V. The 
annihilator of M is 

M° = {fev*|f(M) = o} a 

The term annihilator is quite descriptive, since M® consists of all 
linear functionals that annihilate (send to 0) every vector in M. 

It is not hard to see that M® is a subspace of V*, even when M 
is not. Subject to this, we prove the following. 



Theorem 3.14 If S is a subspace of a finite dimensional vector space 
V, then 

dim{S^) = dim{Y) — dim{S) 



Proof. Let {u^, . . . , be a basis for S, and extend it to a basis 
for V. Let 

be the dual basis to ?B. We show that ^ basis for 

S^. Certainly, this set is linearly independent, so we need only show 
that it spans S^. But if f G S^, then since f G V*, we have 

f = + ■■■ + r^/Tk + Sii^i + • • • + 

and since f(v) = 0 for all v 6 S, we have for i = 




3 The Isomorphism Theorems 



75 



0 = f(uj) = T- 

and so 

f = Si + • • • + s 1 1 / 1 

11 ' ' n— k n— k 

which shows that spans S^. I 

Example 3.5 To see what can happen with regard to Theorem 3.14, in 
the infinite dimensional case, let us continue Example 3.4. Thus, V is 
a vector space over F = {0,1}, with a countably infinite ordered basis 
^ = (b^,b 2 ,...). Let S be the subspace of V with ordered basis C = 
(b^), and let T be the subspace of V with ordered basis ^ = 
(b. 2 ,b 3 , . . .). Since | | = | | , we have T « V. 

Now consider the annihilator S^. Any linear f G T* can be 
extended to a linear functional f G V* by setting f(b^)zz0. 

Moreover, any linear functional in has the form f, for some 

f G T*, since f G implies that f(bj) = 0. Hence, there is a one-to- 
one correspondence between and T*, and so | | = | T* | . But 

T V implies that T* V*, and so | T* | = | V* | , which implies 
that 

1 I = I V* I 

But, we have seen in Example 3.4 that | V* | > | V | , and so 

I I > I V I , which implies that cannot be isomorphic to V, or 
any subspace of V. Hence, dim(S^) > dim(V). D 

The basic properties of annihilators are contained in the following 
theorem. 

Theorem 3.15 

1) For any subsets M and N of V, 

McN => N^CM^ 

2) If rfnn(V) < oo then, identifying V** with V under the natural 
map, we have 

= span{M) 

In particular, if S is a subspace of V, then = S. 

3) If rfnn(V) < oo and S and T are subspaces of V, then 

(SnT)^i=SVT^ and (S + T)^ = S^nT^ I 

The annihilator provides a way to describe the dual space of a 
direct sum. 
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Theorem 3.16 Let V = S 0 T. Then 

1 ) S*ssT° and T* « S° 

2) (S 0 T)* = S°©T“ 

Proof. First, let us prove that Roughly speaking, this says 

that any linear functional on V that annihilates the direct summand 
T is nothing more than a linear functional on its complement S, which 
certainly seems reasonable. The proof consists of making this precise. 
(Note that, in the finite dimensional case, a dimension argument 
establishes the isomorphism.) 

For this purpose, let f G C V*. Thus, f(T) = 0. The map 

I s 

that takes a functional f G V* to its restriction f [ 5 , which is in S*, is 
linear. Moreover, if f | g = 0, then f(S) = 0, and since f(T) = 0, we 
have f = 0. Hence, r is injective. Finally, we must show that r is 
surjective. That is, for g G S*, we must find an f G T^ for which 

fls(s) =g(s) 

for all s G S. In other words, we want to ‘‘extend” g to all of V, in 
such a way that the extension is in T^. But that is easy — we just 
define the extension to be 0 on T. In particular, let f G V* be 
defined by 

f(s-f t) =g(s) 

Then f is well-defined and linear. Moreover, f G T^, since f(t) = 
f(0 + 1) = g(0) = 0, for all t G T. Finally, f | g is indeed g, and so r 
is an isomorphism, which proves that T® « S*. By symmetry, we also 
have « T*. 

To prove part 2, let fGS^flT^. Then f(S) = 0 = f(T), which 
implies that f = 0. Hence, S^nT^ = {0}. Since and T® are 
subspaces of V*, we have (S 0 T)* D 0 T^. 

On the other hand, if fG(S 0 T)*, then we define g, 
hG(S 0 T)* by 

g(s + t) == f(t) and h(s + t) =: f(s) 

It is easy to see that these maps are well-defined and linear. Moreover, 

g(S) = 0 and h(T) = 0 

and so g G and h G T®. Finally, 

f(s + 1 ) = f(t) + f(s) = g(s -f t) -i- h(s -h t) = (g -f h)(s -h t) 

and so f=g + hGS^ 0 T^. Hence, (S 0 T)* C S® 0 T^, which 
completes the proof. I 
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Operator Adjoints 

If r G Jt(V,W), then we may define a map r^:W*^V* by 
r^(f) = f o r = fr 

for f G W*. This makes sense, since r:V-^W and f:W— ^F, and so the 
composition fr:V^F is in V*. Thus 

r"^(f)(v) = f(r(v)) 

for any v G V. The map is called the operator adjoint of r. 

Let us establish the basic properties of the operator adjoint. 

Theorem 3.17 

1) (r + (7f = + for r,(r G £(V,W) 

2) (rr)^ = rr^ for any r G F and r G £(V,W) 

3) {ra)^ = (T^T^ for r G £(V,W) and tr G L(W,U) 

4) for any invertible r G *f(V) 

Proof. We prove parts 3 and 4. Part 3 follows from the fact that 
(r^)X(f) = frcT - a^(fr) = r><(^x(f)) (rV>^)(f) 

for all f G U*. Part 4 follows from 

T [T ) = (T T) = t = L 

and, in the same way, = i. I 

If r G Jt(V,W) then G Jt(W*,V*), and we may form 
G L(V**,W**). Of course, is not equal to r. However, in 

the finite dimensional case, if we use the natural maps to identify V** 
with V and W** with W, then we can think of as being in 

L(V,W). With this in mind, is equal to r. 

Theorem 3.18 Let V be finite dimensional, and let r G Jt(V,W). If 
we identify V** with V and W** with W, using the natural maps, 
then = T. 

Proof. Before making any identifications, we have ^W**, and 

''(v)(f) = vr''(f) = v(fr) = fr(v) = r(v)(f) 
for all f G W*, and so 

rxx(v) = r(v) 

Therefore, with the appropriate identifications, 

= r(v) 

for all V G V, and so = r. I 
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The next result describes the kernel and image of the operator 
adjoint. 

Theorem 3.19 Let r € £(V,W). Then 

1) ker{T^) = im(r)^ 

2) = ker(T\ under the natural identification 

3) C ker(Ty 

4) if dim(V) < oo and dim{W) < oo then im(r^) = ^er(r)® 

Proof. For reference, note that r:V-^W and r^:W*— ^V*. To prove 
(1), observe that 

f G ker{T^) r^(f) = 0 

fr = 0 

^ f(T(v)) = 0 for all V G V 
<=> f(im(r)) = 0 
^ f G im{r)^ 

To prove part 2, we have 

V G ier(r) ^ r(v) = 0 

f{r{y)) = 0 for all f G W* 
r^(f)(v) = 0 for all f G W* 

^ v(r^(f)) = 0 for all f G W* 

V G nn(r^)® 

Part 3 is proved as follows. For all v G Arer(r) and f G W*, 

rX(f)(y)=f(r(v)) = 0 

and so r^(f)(^er(r)) = 0, that is, r^(f) G ^er(r)^. Since this holds for 
all f G W*, we have 

2m(r^) C ier(r)^ 

As for part 4, when the vector spaces are finite dimensional, part 2 
gives 

em(r^) « ier(r)^ 

But, according to part 3, im(r^) C ler(r)®, and so these spaces must be 
equal. I 

Corollary 3.20 Let r G Jt(V,W), where V and W are finite 
dimensional. Then rk{r) = rk(r^). I 
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In the finite dimensional case, r G £(V,W) and G Jt(W*,V*) 
can both be represented by matrices. To explore the connection 
between these matrices, suppose that 

= (bj , . . . , bj^) and C = (cj , . . . , 

are ordered bases for V and W, respectively, and that 

= and = 

are the corresponding dual bases. If we let 

then a— is the coordinate of C| in ?"(bj). 

On the other hand, if 

[t ]g* eg* = (“ij) 

then a - ; is the coordinate of bf in r^(cf). But this coordinate is 

rX(cp(bi)=c|(r(bi)) 

which is the coordinate of Cj in r(b|), and this in turn is aj |. In 
short, a— = p and so 

We have established the following. 

Theorem 3.21 Let r G £(V,W), where V and W are finite 

dimensional. If S is an ordered basis for V, C is an ordered basis 
for W, and S* and C* are the corresponding dual bases, then 

In words, the matrix of the adjoint is the transpose of the matrix 
of r. I 

EXERCISES 

1. If S is a subspace of V, show that u = v <=> u — vGS is an 
equivalence relation on V. 

2. Prove that the operations of coset addition and scalar 
multiplication are well-defined. 

3. Prove that there is only one map U:V/S-^W with the property 
described in Figure 3.1. 

4. Prove the first isomorphism theorem. 

5. Let S be a subspace of a vector space V. Show that 
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rfim(V) = dim{S) -f dim{Y /S) 

6. Complete the proof of Theorem 3.9. 

7. Let S be a subspace of V. Can you describe a relationship 
between the set of all subspaces S' of V for which S C S' C V 
and the set of all subspaces of the quotient space V/S? 

8. Let S be a subspace of V. Starting with a basis for 

S, how would you find a basis for V/S? 

9. Use the First Isomorphism Theorem to prove that if r:V-^W, 
then 

dim{ker{T)) -h dim{im{T)) = dim{Y) 

10. Let r G L(V), and suppose that S is a subspace of V. Define a 
map by 

r':V/S— >V/S r'(v+S) = r(v)+S 

When is r' well-defined? If r' is well-defined, is it a linear 
transformation? What are nn(r') and A:er(r')? 

11. Show that, for any nonzero vector v G V, there exists a linear 
functional f G V* for which f(v) ^ 0. 

12. Show that a vector v G V is zero if and only if f(v) = 0 for all 
f G V*. 

13. Let S be a proper subspace of a finite dimensional vector space 
V, and let v G V — S. Show that there is a linear functional 
f G V* for which f(v) = 1 and f(s) = 0 for all s G S. 

14. Let S be an (n-l)-dimensional subspace of an n-dimensional 
vector space V. Show that there is a linear functional f G V* 
whose kernel is S. If f and g are two such functionals, must 
there be any relationship between them? 

15. Let be a basis for an infinite dimensional vector space V, and 

define, for all b G *35, the map b' G V* by b'(c) = 1 if c = b, 
and 0 otherwise. Does {b' | b G form a basis for V*? What 
do you conclude about the concept of a dual basis? 

16. Show that is a subspace of V*, for any nonempty subset M 
of V. 

17. Prove that (S 0 T)* S* 0 T*. 

18. Prove that 0^ = 0, and that where 0 is the zero linear 

operator and l is the identity. 

19. Let S be a subspace of V. Prove that (V/S)* « S^. 

20. Verify that 

(a) (r -h <t)^ = + cr^ for r,cr G £(V,W). 

(b) (rr)^ = rr^ for any r G F and r G L(V,W) 

Let r G i(V,W), where V and W are finite dimensional. 
Prove that rk{r) = rk{r^). 



21. 
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The Number of Subspaces of a Vector Space over a Finite Field 

22. Let F be a finite field of size q, and let V be an n-dimensional 
vector space over F. The purpose of this exercise is to show that 



there are 



n\ ^ (q"-l)---(q-l) 



subspaces of V of dimension k. The expressions (^) are called 
Gaussian coefficients, and have properties similar to those of the 
binomial coefficients. 

a) Let S(n,k) be the number of k-dimensional subspaces of V. 
Let N(n,k) be the number of k-tuples of linearly independent 
vectors (V|,...,Vj^) in V. Show that 

N(n,k) = (q" - l)(q" - q). • .(q“ - 

b) Now, each of the k-tuples in (a) can be obtained by first 

choosing a subspace of V of dimension k, and then selecting 
the vectors from this subspace. Show that, for any k- 

dimensional subspace of V, the number of k-tuples of 
independent vectors in this subspace is 

(qk-l)(qk-q)...(qk_qk-l) 

c) Show that 

N(n,k) = S(n,k)(qk - l)(q>^ - q). • -(q^ - qk-1) 

How does this complete the proof? 




CHAPTER 4 
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Motivation 

Let V be a vector space over a field F, and let r G -i-(V). Then 
for any polynomial p(x) G F[x], the operator p(r) is well-defined. For 
instance, if p(x) = 1 + 2x -f x^, then 

p(r) = i -f 2r -h 

where l is the identity operator, and is the threefold composition 

T O T O T. 

We can now define the product of a polynomial p(x) G F[x] and 
a vector v G V by 

(4.1) p(x)v = p(r)(v) 

This product satisfies the usual properties of scalar multiplication, 
namely, for all r(x), s(x) G F[x] and u,v G V, 

r(x)(u -h v) = r(x)u + r(x)v 
(r(x) -f s(x))u = r(x)u 4- s(x)u 
[r(x)s(x)]u = r(x)[s(x)u 

lu = u 

Thus, for a fixed r G i^(V), V is an algebraic structure under the 
operations of addition and scalar multiplication by polynomials in F[x]. 
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Note, however, that since the ring F[x] is not a field, these two 
operations do not make V into a vector space. Nevertheless, this 
important situation, which we will study extensively in the sequel, 
motivates the following definition. 



Modules 

Definition Let R be a commuiaiive ring with identity, whose elements 
are called scalars. An R-module (or a module over R) is a nonempty 
set M, together with two operations. The first operation, called 
addition and denoted by + , assigns to each pair (u,v) G M x M, an 
element u + v G M. The second operation, denoted by juxtaposition, 
assigns to each pair (r,u) G RxM, an element rv G M. Furthermore, 
the following properties must hold. 

1) M is an abelian group under addition. 

2) For all r, s G R we have 

r(u + v) = ru + rv 
(r -f s)u = ru + su 
(rs)u = r(su) 
lu = u 



for all u,v G M. D 

The definition of a module requires that the ring R of scalars be 
commutative. This requirement is sometimes omitted, but modules 
over noncommutative rings can behave quite differently than modules 
over commutative rings. For instance, it is possible for a module over a 
noncommutative ring to have bases of different sizes. Since such 
modules will not be needed for the sequel, we require commutativity. 

Even with the requirement of commutativity, modules behave 
quite differently than vector spaces. For example, there are modules 
that do not have any linearly independent elements. Of course, such a 
module cannot have a basis. 

The connection between modules and vector spaces is very simple: 
a vector space is a module over a field. 

Example 4.1 

1) If R is a ring, the set R^ of all ordered n- tuples, whose 
components lie in R, is an R-module, with addition and scalar 
multiplication defined componentwise (just as in F^^), 

(a^, . . . , aj -f (bj, . . . , b J = (a^ -f bp . . . , a^^ b J 



and 
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r(ai,. ••,»„) = (ra^...,rajj) 

for aj, b;, r 6 R. For example, 2” is the Z-module of all ordered 
n-tuples of integers. 

2) If R is a ring, the set n(^) matrices of size m x n, is 

an R-module, under the usual operations of matrix addition and 
scalar multiplication over R. Since R is a ring, we can also take 
the product of matrices in important example is 

when R = F[x], whence j^(F[x]) is the F[x]-module of all 

m X n matrices whose entries are polynomials. 

3) Any commutative ring R with identity is a module over itself, 
that is, R is an R-module. In this case, scalar multiplication is 
just multiplication by elements of R, that is, scalar multiplication 
is the ring multiplication. The defining properties of a ring imply 
that the defining properties of an R-module are satisfied. We 
shall use this example many times in the sequel. D 

When we turn in a later chapter to the study of the structure of a 
linear transformation r G i'(V), we will think of V as having the 
structure of a vector space over F, as well as a module, over F[x]. Put 
another way, V is an abelian group under addition, with two scalar 
multiplications — one whose scalars are elements of F and one whose 
scalars are 'polynomials over F. This viewpoint will be of tremendous 
benefit for the study of r. For now, we concentrate only on modules. 

Many of the basic concepts that we defined for vector spaces can 
also be defined for modules, although their properties are often quite 
different. 



Submodules 

The definition of submodule parallels that of subspace. 

Definition A submodule of an R-module M is a subset S of M that 
is an R-module in its own right, under the operations obtained by 
restricting the operations of M to S. D 

Theorem 4.1 A nonempty subset S of an R-module M is a 
submodule if and only if 

r,s G R, u,v G S => ru-fsvGS I 

Theorem 4.2 If S and T are submodules of M, then S fl T and 
SH-T = {u + v|uGS, vG T) 
are also submodules of M. D 
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Recall that a commutative ring R with identity is a module over 
itself. Thus, in a sense, R plays two roles -- it is a ring and it is an R- 
module. Now suppose that S is a submodule of R. According to 
Theorem 4.1, if a,b G S and r G R, then a — b G S and ra G S. 
Hence, S is an ideal of the ring R. Conversely, if 3 is an ideal of the 
ring R, then 3 is also a submodule of the module R. In other words, 
the subrings of the R-module R are precisely the ideals of the ring R, 

Direct Sums 

The definition of direct sum is the same for modules as for vector 
spaces. We will confine our attention to the direct sum of a finite 
number of modules. 

Definitioii Let M be an R-module. We say that M is the direct sum 
of the submodules S^,...,Sj^ if every vGM can be written, in a 
unique way (except for order), as a sum of elements from the 
submodules S^. More specifically, M is the direct sum of S^,...,Sj^ 
if, for all V G M, we have 

v = Ui + ... + u^^ 

for some u^ G S|, and furthermore, if 

V = Wj + • • • + Wj^ 

where W| G S|, then Wj = u- for all i = 1, . . . , n. 

If M is the direct sum of S^,. . .,Sj^, we write 

M = Sj 0 • • • e 

and refer to each Sj as a direct summand of M. If M = S 0 S^, we 
refer to as a complement of S in M. D 

In the case of vector spaces, every subspace has a complement. 
However, as the next example shows, this is not true for modules. 

Example 4.2 The set Z of integers is a Z-module, that is, Z is a 
module over itself. Let us examine the nature of the submodules of Z. 
Since the submodules of the Z-module Z are precisely the ideals of the 
ring Z, and since Z is a principal ideal domain (see Chapter 0), the 
submodules of Z are precisely the sets 

(n) = Zn = {zn | z G 2} 

Thus, all nonzero submodules of Z are of the form Zu, for some 
positive u G Z. As a result, we see that any two nonzero submodules of 
Z have nonzero intersection. For if u,v > 0, then 0 ^ uv G Zu f1 Zv. 
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Hence, none of the submodules 2u, for u ^ 0 or 1, have 
complements. D 

As with vector spaces, we have the following useful 
characterization of direct sums. 

Theorem 4.3 A module M is the direct sum of submodules 
if and only if 

1) M = + + 

2) For each i = l,...,n 

Sin(ESj) = {0) i 



Spanning Sets 

The concept of spanning set carries over to modules as well. 

Definition The submodule spanned (or generated) by a subset S of a 
module M is the set of all linear combinations of elements of S 



(S) = span(S) = {rjVj + • • • + | rj e R, Vj € M} 



A subset S C M is said to span M, or generate M, if 

M = span(S) 

that is, if every v G M can be written in the form 

v = riVi+... + r^v^ 
for some r^,. . .,r^^ G R, and v^, . . . , v^^ G M. D 

Observe that (v) = Rv = {rv | r G R} is just the set of all scalar 
multiples of v. Since modules of this type are extremely important, 
they have a special name. 

Definition Let M be an R-module. A submodule of the form (v) = 
Rv = {rv I r G R}, for v G M, is called the cyclic submodule generated 
by V. D 

For reasons that will become clear soon, we need the following 
definition. 

Definition An R-module M is said to be finitely generated if it 
contains a finite set that generates M. D 
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Of course, a vector space is finitely generated if and only if it has 
a finite basis, that is, if and only if it is finite dimensional. However, 
for modules, things are not quite as simple. The following is an 
example of a finitely generated module that has a submodule that is not 
finitely generated. 

Example 4.3 Let R be the ring F[x^,X 2 ,...] of all polynomials in 
infinitely many variables over a field F. It will be convenient to use 
the boldface letter x to denote Xj,X 2 ,..., and write a polynomial in R 
in the form p(x). (Each polynomial in R, being a finite sum, involves 
only finitely many variables, however.) Then R is an R-module, and 
as such, is finitely generated by the identity element p(x) = 1. 

Now, consider the submodule S of all polynomials with zero 
constant term. This module is generated by the variables themselves, 

S = (XpX2,...) 

However, S is not generated by any finite set of polynomials. For 
suppose that {Pi,...,Pn} ^ finite generating set for S. Then, for 

each k, there exist polynomials aj^ |(x), . . . , a^^ ^^(x) for which 

( 4 - 2 ) Xk= £ak_i(x)pi(x) 

i=l 

Note that since P](x) E S, it has zero constant term. 

Since there are only a finite number of variables involved in all of 
the Pi(x)’s, we can choose an index k for which p^(x),. . .,Pj^(x) do 
not involve x^. For each aj^j(x), let us collect all terms involving 
and all terms not involving Xj^, 

(4.3) aj^ j(x) = x^.qj(x) + rj(x) 

where qj(x) is any polynomial in R, and rj(x) does not involve Xj^. 
Now (4.2) and (4.3) give 

n 

Xjc = Mj(x) + fj(x)]Pi(x) 

i=l 

= £qjWPiW+ £rjWPiW 

i=l i=l 

The last sum does not involve Xj^ and so it must equal 0. Hence, the 
first sum must equal 1, but this is not possible, since the Pi(x)’s have 
no constant terms. Hence, S has no finite generating set. D 
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Linear Independence 

The concept of linear independence also carries over to modules. 

Definition A nonempty subset S of a module M is linearly 
independent if for any . . . , G M, 

riVi + --- + r„v^ = 0 ri=--- = r^ = 0 

If a set S is not linearly independent, we say that it is linearly 
dependent. D 

It is clear from the definition that any nonempty subset of a 
linearly independent set is linearly independent. 

In a vector space, the set S = {v}, consisting of a single nonzero 
vector V, is linearly independent. However, in a module, this need not 
be the case. 

Example 4.4 The abelian group = {0,l,...,n-l} is a Z-module, 
with scalar multiplication defined by za = (z • a) mod n, for all n G Z 
and a G Z^^. However, since na = 0 for all a G Z^^, we see that no 
singleton set {a} is linearly independent. D 

Recall that, in a vector space, a set S of vectors is linearly 
dependent if and only if some vector in S is a linear combination of 
the other vectors in S. For arbitrary modules, this is not true. 

Example 4.5 Consider the Z-module Z^, consisting of all ordered pairs 
of integers. Then the ordered pairs (2,0) and (3,0) are linearly 
dependent, since 

3(2,0) -2(3,0) = (0,0) 

but neither one of these ordered pairs is a linear combination (i.e., 
scalar multiple) of the other! D 

The problem in the previous example is that 
rivi + --- + r„v„ = 0 

and (say) r ^ / 0 together imply that 

^ 1 ^ 1 = -^ 2^2 

but, in general, we cannot divide both sides by r^, since it may not 
have a multiplicative inverse in the ring R. 

We can now define the concept of a basis for a module. 
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Definition Let M be an R-module. A subset of M is a basis if 
is linearly independent and spans M. D 

Theorem 4.4 A subset ® of a module M is a basis if and only if, for 
every v G M, there is a wnegwe set of scalars rj,...,rj^ for which 

v = riVi+--- + r^v„ I 

In a vector space, a set of vectors is a basis if and only if it is a 
minimal spanning set, or equivalently, a maximal linearly independent 
set. For modules, the following is the best we can do, in general. 

Theorem 4.5 Let ?B be a basis for an R-module M. Then 

1) ?B is a minimal spanning set. 

2) ?B is a maximal linearly independent set. I 

The Z-module of Example 4.4 is an example of a module that has 
no bcisis, since it has no linearly independent sets. But since the entire 
module is a spanning set, we deduce that a minimal spanning set need 
not be a basis. In the exercises, the reader is asked to give an example 
of a module M that has a finite basis, but with the property that not 
every spanning set in M contains a basis, and not every linearly 
independent set in M is contained in a basis. 

We will continue our discussion of bases for modules in a moment, 
but first let us discuss the module counterpart of linear transformations. 



Homomorphisms 

The term linear transformation is special to vector spaces. 
However, the concept applies to most algebraic structures. 

Definition Let M and N be R-modules. A function r:M^N is said 
to be a homomorphism if 

r(ru -f sv) = rr(u) + sr(v) 

for all scalars r,s G R and u,v G M. The set of all homomorphisms 
from M to N is denoted by ^fom(M,N). Moreover, we have the 
following definitions. 

1) An endomorphism is a homomorphism from M to M. 

2) A monomorphism is an injective homomorphism. 

3) An epimorphism is a surjective homomorphism. 

4) An isomorphism is a bijective homomorphism. D 




4 Modules I 



91 



Theorem 4.6 Let r G iTom(M,N). The kernel and image of r, defined 
as for linear transformations, by 



and 



ker{T) = {v G M I r(v) = 0} 
nn(r) = {'t(v) | v G M} 



are submodules of M and N, respectively. I 



Free Modules 

The fact that not all modules have a basis leads us to make the 
following definition. 

Definition An R-module M is said to be free if it has a basis. If ^ 
is a bcisis for M, we say that M is free on *$. D 

The next example shows that even free modules are not very much 
like vector spaces. It is an example of a free module that has a 
submodule that is not free! 

Example 4.6 The set Z x Z is a free module over itself, with basis 
{(1,1)}. To see this, observe that (1,1) is linearly independent, since 

(n,m)(l,l) = (0,0) => (n,m) = (0,0) 

Also, (1,1) spans Z x Z, since (n,m) = (n,m)(l,l). 

But the submodule S = Z x {0} is not free, since it has no 
linearly independent elements, and hence no basis. This follows from 
the fact that, if (n,0) (0,0), then, for instance (0,l)(n,0) = (0,0), and 

so {(n,0)} is not linearly independent. D 

Since all bases for a vector space V have the same cardinality, 
the concept of vector space dimension is well-defined. We now turn to 
the same issue for modules. The next example shows what can happen 
if the ring R is not commutative — it is an example of a module over a 
noncommutative ring that has a basis of size n for any natural 
number n! 

Example 4.7 Let V be a vector space over F, with a countably 
infinite basis = (bj,b 2 ,. . .}. Let R = £(V) be the ring of linear 
operators on V. Observe that R is not commutative, since 
composition of functions is not commutative. 

The ring R is an R-module, and as such, the identity map l 
forms a basis for R. However, we can also construct a basis for R of 
any desired finite size n. We begin by partitioning ^ into n blocks. 
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For each s = O 5 ... 5 I 1 -I, let 

Sg = {b| I i = s mod n} = {b| | i = kn+s for some k} 

Now we define elements G R = Jt(V) by their action on the basis 
vectors in S as follows. The intention is that /?g is zero on all basis 
vectors not in Sg, and /3^ takes b|^^_g € % to 

Since any nonnegative integer i has the form kn + t for unique 
k and t satisfying 0 < t < n, we can define by 

bj^ if t = s 

0 if t ^ s 




Now linearly independent. For if «g € Jt(V), 

and 

0 = "0/^0 + • • • + 

then, applying this to gives 

0 = = “t(t>k) 

for all k. Hence, = 0. 

Also, spans R = Jt(V). For if r € Jt(V), we define 

% € X(V) by 

“s(^c) = '^(hn+s) 

Then 

(«0^0 + • • ■ + »n-l/?n-l)(t»kn+t) = ^Aihn+t) = «t(*»k) = ^(^kn+t) 

and so 

r = ao^O + -” + «n-l/?n-l 

which shows that r 6 span{0Q^ ..., Thus, = {^q, ..., 
is a basis for JL(V), and we have shown that £(V) has a basis of any 
finite size n. D 



Example 4.7 shows that modules over noncommutative rings can 
behave very poorly when it comes to bases. Fortunately, when the ring 
of scalars is commutative, things are much nicer. We will postpone the 
proof of the following theorem to the next chapter. 

Theorem 4.7 Let M be a free R-module. (By our definition, R is a 
commutative ring with identity.) Then any two bases of M have the 
same cardinality. I 

Theorem 4.7 allows us to define the rank of a free module. (In the 
case of modules, it is customary to use the term rank, rather than 
dimension.) 
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Definition Let M be a free R-module. We define the rank rk{M) of 
M to be the cardinality of any bcisis for M. D 

Recall that if B is a basis for a vector space V over F, then V 
is isomorphic to the vector space (F®)q of all functions from B to F 
that have finite support. A similar result holds for free R-modules. We 
begin by establishing that (R®)q is a free R-module. 

Theorem 4.8 Let B be any set, and let R be a ring. The set (R®)q 
of all functions from B to R that have finite support is a free R- 
module, with basis ^ defined by 

{ 1 if x = b 

0 if X / b 

and rank | B | . This basis is referred to as the standard basis for 
(R®)o- ■ 



Theorem 4.9 Let M be an R-module. If B is a basis for M, then 
M is isomorphic to (R®)o* 



Proof. Since B is a basis for M, any v G M has a unique 
representation (up to order) as a linear combination of elements of B. 



If 



v = ribi + --- + r„b„ 



then we let v £ (R®)o be the function defined by 



v(b) = 



0 



if b = b| for some i 
if b / b- for any i 



In words, v is the function that assigns to each basis element b G B, 
the coefficient of b in the expression of v as a linear combination of 
basis elements. This defines a map r:M— ^(R®) q, by r(v) = v. 

It is easy to see that r is a module homomorphism from M to 
(R®)q. Furthermore, r is injective, since r(v) = v = 0 implies that 
the coordinates of v with respect to all basis elements are 0, and so v 
must be 0. Also, r is surjective, since if f G (R®)o) f^en we may 
define v G M by 

V = 



b G B 



Since f has finite support, this is a finite linear combination of basis 
elements. Moreover, r(v) = v has the property that v(b) = f(b) for 
all b G B, and so r(v) = v = f . Thus, r is an isomorphism from M 
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to (rS)o, I 

Corollary 4.10 Two free R-modules are isomorphic if and only if they 
have the same rank. 

Proof. If M « N, then any isomorphism r from M to N maps a 
basis for M to a basis for N. Since r is a bijection, we have 
rk(M) = rk(N), Conversely, suppose that rk{M) = rk(N), Let be a 
basis for M and let C be a basis for N. Since | | = | C | , there is 

a bijective map r:^B— ^C. This map can be extended by linearity to an 
isomorphism of M onto N, and so M « N. I 



Summary 

Here is a list of some of the properties of modules that emphasize 

the differences between modules and vector spaces. 

1) A submodule of a module need not have a complement. 

2) A submodule of a finitely generated module need not be finitely 
generated. 

3) There exist modules with no linearly independent elements, and 
hence with no basis. 

4) In a module, there may exist a set S of linearly dependent 
elements for which no element in S is a linear combination of the 
other elements in S. 

5) In a module, a minimal spanning set is not necessarily a ba^is. 

6) In a module, a maximal linearly independent set is not necessarily 
a basis. In fact, maximal linearly independent sets need not even 
exist. 

7) A module over a noncommuiative ring may have bases of different 
sizes. However, all bases for a free module over a commutative 
ring with identity have the same size. 

8) There exist free modules with linearly independent sets that are 
not contained in a basis, and spanning sets that do not contain a 
basis. 



EXERCISES 

1. Give the details to show that any commutative ring with identity 
is a module over itself. 

2. Let M be an R-module, and let I be an ideal in R. Let IM 
be the set of all finite sums of the form 

where rj G I and Vj G M. Is IM a submodule of M? 
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3. Show that if S and T are submodules of M, then (with respect 
to set inclusion) 

SnT=:glb{S,T} and S + T = lub{S,T} 

4. Let Sj C S 2 C * • * be an ascending sequence of submodules of an 
R-module M. Prove that the union |JS| is a submodule of M. 

5. Is it true that a subset S of a module M is linearly independent 
if and only if every element of span{S) can be expressed as a 
unique linear combination of elements of S? Explain. 

6. Consider the Z-module = {0, . . . ,n-l}, with scalar 

multiplication defined by 

zu — (z • u) mod n 

for z G Z and u G Z^^. Which subsets of 1.^ (if any) are linearly 
independent? 

7. Give an example of a module M that has a finite basis, but with 
the property that not every spanning set in M contains a basis, 
and not every linearly independent set in M is contained in a 
basis. 

8. Let r G iTom(M,N) be an isomorphism. If is a basis for M, 
prove that r(‘3B) == {'r(b) | b G is a basis for N. 

9. Consider the ring R = F[x,y] of polynomials in two variables. 
Show that the set M consisting of all polynomials in R that 
have zero constant term, is an R-module. Show that M is not a 
free R-module. 

10. Referring to Example 4.7, where R = L(V), show that R*^ is 
isomorphic to R”^ for all n and m. (By R^, we mean the set 
of all ordered n-tuples of elements of R.) 

11. How does the proof of Corollary 4.10 use the fact that R is a 
commutative ring? 

12. Prove that if a ring R hcts the property that every finitely 
generated R-module is free, then either R is the zero ring or R 
is a field. 

13. Let I be an ideal in R. Prove that I is a free R-module if and 
only if I is a principle ideal, generated by an element in R that 
is not a zero divisor. 

14. Let M be an R-module. An element v G M is called a torsion 
element if there exists a nonzero r G R for which rv = 0. 

a) Prove that if R is an integral domain, then the set Tor{M) 
of torsion elements in M forms a submodule of M. 

b) Find an example of a ring R with the property that, 
thinking of R as an R-module, the set Tor(R) is not a 
submodule of R. 




CHAPTER 5 

Modules II 



Contents: Quotient Modules, Quotient Rings and Maximal Ideals, 

Noetherian Modules, The Hilbert Basis Theorem, Exercises, 



Quotient Modules 

The procedure for defining quotient modules is the same as that 
for defining quotient spaces. We summarize in the following theorem. 

Theorem 5.1 Let S be a submodule of an R-module M. The binary 
relation 

u = v ^ u — vGS 

is an equivalence relation on M, whose equivalence classes are the 
cosets 

if = v+S = {v -f s I s G S} 

of S in M. The set M/S of all cosets of S in M, called the 
quotient module of M modulo S, is an R-module under the well- 
defined operations 

(u+S) + (v+S) = (u + v)+S and r(u+S) == ru+S 
The zero element in M/S is the coset 0 + S = S. I 

It is left to the reader to formulate and prove precise statements of 
the three isomorphism theorems for modules that correspond to the 
isomorphism theorems of Chapter 3. 

One question that immediately comes to mind is whether or not a 
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quotient space of a free module need be free. As the next example 
shows, the answer is no. 

Example 5.1 As a module over itself, the Z-module Z is free on the 
set {!}. The set Zn = {zn | z G 2} is a (cyclic) submodule of Z, but 
the quotient Z-module Z/Zn is isomorphic to Z^^, via the map 

r(u+ Zn) = u mod n 

and since 2 ^ is not free as a Z-module, neither is Z/Zn. D 



Quotient Rings and Maximal Ideals 

In order to prove Theorem 4.7, we need a few more facts about 
rings. The construction of quotient spaces and quotient modules works 
equally well for other algebraic structures. For rings, it proceeds as 
follows. Let S be a subring of a commutative ring R with identity. 
Then the set of all cosets 

R/S = {r+S I r G R} 

is easily seen to be an abelian group under coset addition 
(a+S) + (b+S) = (a + b)+S 
In order for the product 

(a-|"S)(b-}-S) = ab+S 
to be well-defined, we must have 

b+S = b'+S ab+S = ab'+S 

or, equivalently, 

b-b'GS a(b-b')GS 

But b — b' may be any element of S, and a may be any element of 
R, and so this condition implies that S must be an ideal. Conversely, 
if S is an ideal, then coset multiplication is well-defined. 

Theorem 5.2 Let R he a commutative ring with identity. If 3 is any 
ideal of R, then the set R/3 of all cosets of 3 in R is a ring, called 
the quotient ring of R modulo 3, where addition and multiplication 
are defined by 

(a+S) + (b-fS) = (a + b)+S 

(a+S)(b+S) = ab+S I 

Definition An ideal 3 in a ring R is a maximal ideal if 3 ^ R, and if 
whenever | is an ideal satisfying 3 C 3 C R, then either 3 = 3 or 
3 = R. D 
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Here is one reason why maximal ideals are important. 

Theorem 5.3 Let R be a commutative ring with identity. Then the 
quotient ring R/3 is a field if and only if 3 is a maximal ideal. 

Proof. Suppose first that R/3 is a field. Assume that 3 is not 
maximal, and so there exists an ideal j with the property that 
3 § 3 S R. Let j G J — 3, and consider the ideal 

3G = (j>3) C 3 

generated by j and 3. Since j ^ 3, we have j+3 ^ 0 and since R/3 
is a field, j+3 must have an inverse, say j'+3, for which 

= jj^+3 — 1+3 

Therefore, 1 — jj' G 3 C 3G and since jj' G 3G, we have 1 G 9G, which 
implies that 3G = R. But 3G C 3 j is a proper subset of R. 

This contradiction implies that 3 is maximal. 

Conversely, suppose that 3 is maximal. We want to show that 
any nonzero r+3 G R/3 has an inverse. But if 0 ^ r-|-3, then r ^ 3, 
and so the ideal | = (r,3) is strictly larger than 3. Since 3 is 
maximal, we must then have J = R. This implies that 1 G 3, and so 
there exists s G R for which 1 = sr+i, for some i G 3. Hence, 

(s-|-3)(r+3) = sr+3 = (1 — i)+3 = 1+3 

and so (r+3)~^ = s+3. Hence, R/3 is a field. I 

We need one more fact in order to prove Theorem 4.7. 

Theorem 5.4 Any commutative ring R with identity contains a 
maximal ideal. 

Proof. Since R is not the zero ring, it has a proper ideal, namely, 
{0}. (By a proper ideal, we mean an ideal different from R itself.) 
Let if be the collection of all proper ideals of R. Then If is 
nonempty. If 

3iC32C*“ 

is a chain of proper ideals in R, then the union 3 = |J 3j is also an 
ideal. Furthermore, if 3 = 1^? 1-hen 1 G 3? ^nd so 1 G 3j^, for some k, 
which implies that 3j^ = R, and this contradicts the fact that 5^ is 
proper. Hence, 3 C 3*. Thus, any chain in If has an upper bound, and 
so Zorn’s lemma implies that If has a maximal element. This shows 
that R has a maximal ideal. I 



We are now ready for the proof of Theorem 4.7. 
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Theorem 5.5 Let M be a free R-module. Then any two bases of M 
have the same cardinality. 

Proof. Our plan is quite straightforward. We seek to find a vector 
space V with the property that, for any basis for M, there is a basis 
of the same cardinality for V. Then we can appeal to the 
corresponding result for vector spaces, which we proved in Chapter 1. 

Now, according to Theorem 5.4, R has a maximal ideal 3, and 
according to Theorem 5.3, R/3 is a field. Let 

3M = {a^Vj + • • • + | aj G 3, Vj € M} 

Then 3M is a submodule of M, and so we may form the quotient 
module M/3M. 

We want to show that M/3M is a vector space over R/3, with 
scalar multiplication defined by 

(r+3)(u+3M) = ru-i-3M 

To see that this is well-defined, suppose that 

r+3 = r'+3 and u+3M = u'+3M 



We must show that 

ru+3M = r'u'+3M 
Equivalently, we must show that 

r — r' G 3, u — u' G 3M => ru — r'u' G 3M 



But 



r — r' G 3, u -- u' G 3M (r — r')u' G 3M and r(u — u') G 3M 

=> (r — r')u' + r(u — u') = ru — r'u' G 3M 

Hence, scalar multiplication is well-defined. We leave it to the reader 
to show that the necessary properties of scalar multiplication are 
satisfied, and so M/3M is indeed a vector space over R/3. 

Let 55 be a basis for M over R. If b| and bj are in 55 then 
bj+3M and bj-f-3M are distinct, for if 

bi+3M = bj+3M 

then b- — bj G 3M, and so 

bi-bj=aiVi + ... + a„v^ 

for a| G 3, V| G M. But each V| is a linear combination of the basis 
vectors in 55. Let us suppose that the coefficient of b| in Vj^ is rj^, 
for k = l,...,n. Equating coefficients of b| on both sides gives 

l=airi+--. + a„r„ 
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But the sum on the right side of this equation is in the ideal 3, and so 
1 G 3, which is a contradiction to the fact that 3 is maximal (and 
hence proper). Thus, the set 

= {b+3M I b G 

has the same cardinality as the basis of M. We need only show 
that *35' is a basis for the vector space 3M over R/3. 

It is clear that generates M/3M over R/3, since *35 

generates M. To see that is linearly independent, observe that 

(rj+3)(bj+3M) = 0 ^ ^ (rjbj+3M) = 0 

jeu jeu 

jeu j€U lev 

Equating coefficients of bj on both sides shows that rj G 3, and so 
rj+3 = 0. This shows that *3&' is linearly independent. Hence ^B' is a 
basis for M/3M. Thus, | ?B | = dim{M/^M) is independent of the 
choice of basis *35. I 

Noetherian Modules 

One of the most desirable properties of a finitely generated R- 
module M is that all of its submodules be finitely generated. Example 
4.3 shows that this is not always the case, and leads us to search for 
conditions on the ring R that will guarantee that any finitely 
generated R-module has only finitely generated submodules. 

Definition An R-module M is said to satisfy the ascending chain 
condition on submodules if, for any ascending sequence of submodules 

SiCS2CS3C--- 

of M, there exists an index k for which = ^k+l ~ ^k+2 = ***• D 

Put less formally, an R-module satisfies the ascending chain 
condition (abbreviated a.c.c.) on submodules if any ascending chain of 
submodules eventually becomes constant. 

Theorem 5.6 The following are equivalent for an R-module M. 

1) Every submodule of M is finitely generated. 

2) M satisfies the a.c.c. on submodules. 

Any module that satisfies either of these conditions is called a 
noetherian module (after Emmy Noether^ one of the pioneers of module 
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theory). 

Proof. Suppose that all submodules of M are finitely generated, and 
that M contains an infinite ascending sequence 

(5.1) SiCSjCSgC--- 

of submodules. Then the union 

S=USj 

j 

is easily seen to be a submodule of M. Hence, S is finitely generated, 
and S = (uj,. . .,Uj^}, for some U| G M. Since U| G S, there exists an 

index k| such that U| G Therefore, if k = max{kj,...,kj^}, we 

have ^ 

U] G Sk for all k = l,...,n 

and so 

S = (ui,...,uJcSkCSj,^iCSk+2C---cS 

which shows that the submodules in the chain ( 5 . 1 ), from Sk on, are 
equal. 

For the converse, we must show that if M satisfies the a.c.c on 
submodules, then every submodule of M is finitely generated. To this 
end, let S be a submodule of M. Pick u^ G S, and consider the 
submodule S^ = (u^) C S generated by Uj. If Sj = S, then S is 

finitely generated. If S^ 7^ S, then there is a U2GS — S2. Now let 

S2 = (uj,U2). If S2 = S, then S is finitely generated. If S2 7^ S, then 
pick U3 G S — S2, and consider the submodule S3 = (uj,U2,U3). 

Continuing in this way, we get an ascending chain of submodules 

(Uj) c (UpUj) C (Uj,U2,U3) C • • • C S 

If none of these submodules is equal to S, we would have an infinite 
ascending chain of submodules, each properly contained in the next, 
which contradicts the fact that M satisfies the a.c.c. on submodules. 
Hence, S = (uj,...,Uj^), for some n, and so S is finitely generated. I 

Since a ring R is a module over itself, and since the submodules 
of the module R are precisely the ideals of the ring R, the preceding 
may be formulated for rings as follows. 

Definition A ring R is said to satisfy the ascending chain condition on 
ideals if, for any ascending sequence of ideals 

3iC32C33C-*‘ 

of R, there exists an index k for which - ^k+1 - ^k+2 - ■ ' D 
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Theorem 5.7 The following are equivalent for a ring R. 

1) Every ideal of R is finitely generated (as an R-module). 

2) R satisfies the a.c.c. on ideals. 

Any ring that satisfies either of the conditions is called a noetherian 
ring. I 



Now we are ready for the main result of this section. 

Theorem 5.8 If R is noetherian, then so is any finitely generated R- 
module. 

Proof. Let M be a finitely generated R-module, say M = (uj,...,Uj^}. 
Consider the epimorphism r:R^-^M defined by 

r(ri,...,rJ = riUi+.-- + r„u„ 

Let S be a submodule of M. Then 

T-\S) - {u G R^ I r(u) G S} 

is a submodule of R^, and r(r“^(S)) = S. Now suppose that R^ has 
only finitely generated submodules, and so is finitely 

generated, say, r”^(S) = (v^,. . ., Vj^). Then if w G S, we have w = r(v) 
for some v G r“^(S), and since 

v = r^vi-f ...Trj^vj^ 

we get 

w = r(v) = rjr(vj) + • • • + rkr(vj 

which implies that S is finitely generated, by . • • ? 

Therefore, the proof will be complete if we can show that every 
submodule of R^^ is finitely generated. 

We do this by induction on n. If n = 1, the result is clear. 
Suppose that R^ has only finitely generated submodules, for all 
1 < k < n. Let S be a submodule of R^^, and consider the sets 

Si = {(si,.--,s„_i,0)|(si,...,s„_i,sJeS for some s„} 

and 

S 2 = {(0, . . . , 0,sj I (Sj, . . . , 6 S for some s,J 

It is easy to see that Sj and S 2 are submodules of R^. Moreover, 
Sj is isomorphic to a submodule of R^“^, obtained by simply 
dropping the last coordinate. 

Si « {(s„...,s„_,) I (si,...,s„_i,0) G Si) C 

and similarly, S 2 is isomorphic to a submodule of R, 

S 2 ~ {Sj^ I (0, . . . , 0,Sj^) G S 2 } C R 
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Therefore, the induction hypothesis (and the isomorphisms) imply that 
Sj and S 2 are finitely generated, say 



Hence, 



Sj = (ui,...,U 3 ) and 83 = 

S = 81082 



is finitely generated, by {uj, . . . , Ug,Vi, . . . , v J. | 



The Hilbert Basis Theorem 

Theorem 5.8 naturally leads us to ask which familiar rings are 
noetherian. We leave it to the reader to show that a commutative ring 
R with identity is a field if and only if its only ideals are {0} and R. 
Hence, a field is a noetherian ring. Also, any principal ideal domain is 
noetherian. The following theorem describes some additional 

noetherian rings. 

Theorem 5.9 (Hilbert basis theorem) If a ring R is noetherian, 

then so is the polynomial ring R[x]. 

Proof. We wish to show that any ideal 3 in R[x] is finitely 
generated. Let L denote the set of all leading coefficients of 
polynomials in 3, together with the 0 element of R. Then L is an 
ideal of R. 

To see this, observe that if a E L is the leading coefficient of 
f(x) E 3, and if r E R, then either ra = 0 or else ra is the leading 
coefficient of rf(x) E 3. In either case, ra E L. Similarly, suppose that 
/? E L is the leading coefficient of g(x) E 3. We may ^sume that 
deg f(x) = i and deg g(x) = j, with i < j. Then h(x) = x-*“T(x) is in 
3, has leading coefficient a, and has the same degree as g(x). Hence, 
a — /3 is either 0 or it is the leading coefficient of h(x) — g(x) E 3. In 
either case a — /3 £L. 

Since L is an ideal of the noetherian ring R, it must be finitely 
generated, say L = (a 2 ,...,aj^). Since E L, there exist polynomials 
f|(x) with leading coefficients a^. By multiplying each f|(x) by a 
suitable power of x, we may assume that deg f|(x) = d for all i = 
l,...,k. 

Now, let 

g(x) = go + giX + --- + gjjX" 

be any polynomial in 3 with deg g(x) > d. 8 ince g^^ € L, we have 

gn = riai + .-- + rkak 



h(x) =g(x)- ]^rifi(x) 
i 



and so 
(5.2) 
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has coefficient of equal to 0. In other words, deg h(x) < deg g(x). 

We now have the basis for an induction argument. Any 
polynomial in 3 of degree less than d is certainly generated by the set 

S = {l,x,...,x‘^~\fi(x),...,f^(x)} 

Assume, for the purposes of induction, that any polynomial of degree at 
most n — 1 is generated by S. Let g(x) have degree n. Referring to 
(5.2), we see that deg h(x) < n - 1, and so h(x) G 3 is generated by S. 
But then 

g(x) =h(x)+ ^rjfi(x) 
is also generated by S. ■ 



EXERCISES 

1. State and prove the first isomorphism theorem for modules. 

2. State and prove the second isomorphism theorem for modules. 

3. State and prove the third isomorphism theorem for modules. 

4. If M is a free R-module, and r:M-^N is an epimorphism, must 
N also be free? 

5. Let I be an ideal of R. Prove that if R/I is a free R-module, 
then I is the zero ideal. 

6. Show that the submodules of the R-module R are the same as the 
ideals of the ring R. 

7. Let R be a commutative ring with identity. An ideal 3 in R 
is called a prime ideal if r,s G R and rs G 3 implies that r G 3 
or s G 3. Show that R/3 is an integral domain if and only if 3 
is a prime ideal and 3 R. 

8. Prove that the union of an ascending chain of submodules is a 
submodule. 

9. Prove that a commutative ring R with identity is a field if and 
only if it has no ideals other than {0} and R. 

10. Let S be a submodule of an R-module M. Show that if M is 
finitely generated, so is the quotient module M/S. 

11. Let S be a submodule of an R-module. Show that if both S 
and M/S are finitely generated, so is M. 

12. Referring to the proof of Theorem 5.5, show that the necessary 
properties of scalar multiplication are satisfied and so M/3M is 
indeed a vector space over R/3. 

13. Show that an R-module M satisfies the a.c.c. for submodules if 
and only if the following condition holds. Every nonempty 
collection If of submodules of M has a maximal element. That 
is, for every nonempty collection i of submodules of M, there is 
an S G if with the property that T G if => T C S. 
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14. Let r:M— >N be a homomorphism of R-modules. 

a) Show that if M is finitely generated, then so is im(r). 

b) Show that if ker(T) and im{r) are finitely generated, then 
M = ker{T) + im(r) is finitely generated. 

15. If R is a noetherian ring, show that any proper ideal of R is 
contained in a maximal ideal. 

16. If R is noetherian, and 3 is an ideal of R, show that R/3 is 
also noetherian. 

17. Prove that if R is noetherian, then so is R[xj,...,Xj^]. 




CHAPTER 6 



Modules over Principal Ideal Domains 



Contents: Free Modules over a Principal Ideal Domain. Torsion 

Modules. The Primary Decomposition Theorem. The Cyclic 
Decomposition Theorem for Primary Modules. Uniqueness. The 
Cyclic Decomposition Theorem. Exercises. 



Free Modules over a Principal Ideal Domain 

When a ring R has nice properties (such as being noetherian), 
then its R-modules tend to have nice properties (such as being 
noetherian, at least in the finitely generated case). Since principal ideal 
domains (abbreviated p.i.d.s) have very nice properties, we expect the 
same for modules over p.i.d.s. 

For instance, Example 4.6 showed that a submodule of a free 
module need not be free. However, if the ring of scalars is a principal 
ideal domain, this cannot happen. 

Theorem 6.1 Let M be a free module over a principal ideal domain 
R. Then any submodule S of M is also free. Moreover, 
rk{S) < rk{M). 

Proof. We will give the proof only for modules of finite rank, although 
the theorem is true for all free modules. Thus, since M « R^, we may 
in fact assume that M = R^^. Our plan is to proceed by induction 
on n. 

For n = 1, we have M = R, and any submodule S of R is just 
an ideal of R. Hence, S = (a) is principal. But since R is an 
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integral domain, we have ra / 0 for all r ^ 0, and so the map 

r:R-^S, r(r) = ra 

is an isomorphism from R to S. Hence, S is free. 

Now assume that any submodule of R^ is free, for 1 < k < n — 1, 
and let S be a submodule of R^. Consider the sets 

Si = {(si,...,s„_i,0)|(si,...,s„_i,sj€S for some sj 

and 

§2 = {(0,...,0,sj |(si,...,s„_i,sj eS for some sj 
It is easy to see that and S 2 are submodules of R^, and that 

S = Si0S2 

Moreover, is isomorphic to a submodule of R^”^, obtained by 

simply dropping the last coordinate. 

Si ^ {(Sij . . • ?Sn-l) I (®1’ * * • ^ ^ 1 } ^ ^ 

and S 2 is isomorphic to a submodule of R, 

^2 ^ I • • • 5 ^ ^ 2 } ^ ^ 

Therefore, the induction hypothesis (and the isomorphisms) imply that 
Si and S 2 are free. If Si is free on {ui,...,Ug}, where s < n-1, 
and S 2 is free on (vi), then S is free on {ui,...,Ug,Vi}, where 
s H- 1 < n. I 



Torsion Modules 

In a vector space V over a field F, if r G F and v G V are 
nonzero, then rv is nonzero. In a module, this need not be the case 
and leads to the following definition. 

Definition Let M be an R-module. If v G M has the property that 
rv = 0 for some nonzero r G R, then v is called a torsion element of 
M. A module that has no nonzero torsion elements is said to be torsion 
free. If all elements of M are torsion elements, then M is a torsion 
module. D 

If M is a module, it is not hard to see that the set of all 

torsion elements is a submodule of M, and that is torsion 

free. Moreover, any free module over a principal ideal domain is torsion 
free. The following is a partial converse. 

Theorem 6.2 Let M be a torsion free, finitely generated module over a 
principal ideal domain R. Then M is free. 
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Proof. Since M is finitely generated, we have M = (v^,. . . , for 
some V- G M. Now let us take a maximal linearly independent subset 
of these generators, say S = {u^,. . .,Uj^}, and renumber to get 

M = (u„...,Uk,v„...,v„_^) 

Thus, for each Vj, the set {u^,. . .,Uj^,V|} is linearly dependent, and so 
there exists a- and r^,...,rj^ for which 

ajVi + r,Ui + ... + rkUk = 0 

Now, if we let a = a^- • *^n-k product of the coefficients of the 

various v-’s, then avj G 5pan(S), for all i = l,...,n-k. 

Hence, the module aM = {av | v G M} is a submodule of 
span{S), But span(S) is a free module, with basis S, and so by 
Theorem 6.1, aM is also free. Finally, M « aM, since the map 

r(v) =: av 

is an epimorphism, that happens to be injective, because M is torsion 
free. Thus M, being isomorphic to aM, is also free. I 

Our goal in this section is to show that any module M over a 
principal ideal domain is the direct sum 

where is a free module. This is the first step in the 

decomposition of a module over a principal ideal domain. 

Since the quotient module M/M^^j. is torsion free and since 
M/M^qj. is finitely generated when M is finitely generated, we deduce 
from Theorem 6.2 that M/M^^^ is a free module. Consider the natural 
projection 

7r:M-*M/Mt„^, 7 t(v) = 

It is tempting to infer (as we would for vector spaces) that M is 
isomorphic to the direct sum of ker{7r) and 2m(7r), and since 

ker{7r) = and nn(7r) = M/M^^^. is free, the desired result would 

follow. Happily, this is the case for modules as well. 

Theorem 6.3 Let M be a finitely generated module over a principal 
ideal domain R. Then 

where is a free R-module. 

Proof. Consider the epimorphism TrrM^M/M^^j. from M onto the 
free module M/M^^^.. Let S be a basis for M/M^^^.. For each b G B, 
choose a b' G M with the property that ^(b') = b. Let be the set 
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of all such elements of M. We leave it to the reader to show that 
is linearly independent. Hence, S = span(^') is a free submodule of 
M. Moreover, 

V e n S = ker{TT) fl S => 7t(v) = 0 and v = Sr^b/ 

0 = Eri7T(b/) = Er^bj 

=> r- = 0 for all i 

V = 0 

and so M^^j.nS = {0}. Furthermore, if v G M, then 7t(v) == Es^b-, for 
some S| G R. Now let u = Es|b|' G S. Then 

7t(v — u) = 7t(v) — 7r(Es|b|') = Es|b| — Es|b| = 0 

and so x = v — u G ker{7r). Hence, v = x -|- u G ker{7r) + S. This shows 
that M = i:er(7r) 0 S = 0 S. I 

In view of Theorem 6.3, we can turn our attention to the 
decomposition of finitely generated torsion modules over a principal 
ideal domain. 

The Primary Decomposition Theorem 

To show that every finitely generated torsion module over a 
principal ideal domain is the direct sum of cyclic submodules, we need 
some definitions. 

Definition Let M be an R-module. The annihilator of v G M is 
ann{v) = {r G R | rv 0} 
and the annihilator of M is 

ann(M) = {rGR|rM = {0}} 
where rM = {rv | v G M}. D 

It is ecisy to see that ann(v) and flnn(M) are ideals of R. 
Clearly, v G M is a torsion element if and only if ann(v) ^ {0}. 

If M is a finitely generated torsion module over a principal ideal 
domain, say M = (u^,. . .,Uj^), then there exists nonzero a^ G aw/i(u|), 
for i = l,...,n. Hence, the nonzero product a = aj---aj^ satisfies 
av = 0 for all v G M, and so a G ann(M). This shows that 
ann(M) ^ {0}. 

Definition Let M be a finitely generated torsion module over a 
principal ideal domain. Any generator of the principal ideal anw(v) is 
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called an order of v. Any generator of the nonzero principal ideal 
ann(M) is called an order of M. D 

Annihilators are also referred to as order ideals. Note that any 
two orders /x and u of M (or of v G M) are associates, that is, 

ann(M) = {//) = {i') => = for some unit u G R 

Hence, an order of M is uniquely determined up to multiplicative unit, 
and so ji and u have the same factorization into a product of prime 
elements in R, up to multiplication by a unit. 

Definition A module M is said to be primary if its annihilator has the 
form ann(M) = (p®), where p is a prime and e is a positive integer. 
In other words, M is primary if it has order a positive power of a 
prime. D 

Note that a finitely generated torsion module M over a principal 
ideal domain is primary if and only if every element of M has order a 
power of a fixed prime p. 

Our plan for the decomposition of a torsion module M is to first 
decompose M as a direct sum of primary submodules. 

Theorem 6.4 (The primary decomposition theorem) Let M be a 

nonzero finitely generated torsion module over a principal ideal domain, 
with order 

/z = pjl...p„n 

where the p^’s are distinct primes. Then M is the direct sum 

M = e • • • e 

where 

={vGM|p% = 0} 

Pi 

is a primary submodule, with order Pi*. 

Proof. Let /x = pq, where gcd(p,q) 1, and consider the sets 

Mp = {v G M I pv = 0} and = {v G M | qv = 0} 

We wish to show that M = Mp 0 and that Mp and have 

annihilators (p) and (q), respectively. 

Since p and q are relatively prime, there exist a,b G R such 

that 

(6.1) ap + bq=l 

(This follows from the fact that the ideal (p,q) is generated by 
gcd(p,q) = 1, and so 1 € (p,q).) Now, if v G Mp fl M^, then pv = 
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qv = 0 and so 

V = Iv = (ap + bq)v = 0 

Thus Mp n Mq = {0}. From (6.1), we also get, for any v G M, 

V = Iv = apv + bqv 

Moreover, q(apv) = a(pq)v = a/iv = 0 implies that apv € M^, and 
similarly, bqv G Mp. Hence, v G Mp + M^. 

Now suppose that rMp = 0. Then, for any v = v^ + V 2 G 
Mp 0 Mq = M, we have 

rqv = rq(v^ + V 2 ) = qrv^ + rqv 2 = 0 

and so rq G ann(M), which implies that /i = pq | rq. Hence p | r, 
which shows that ann(M^) = (p). Similarly, anw(Mq) = (q). 

Finally, since fi can be written as a product of primes, say 

we can use the preceding argument to write 

M = M„ 0N 
Pi 

where N is a submodule with annihilator (/i/p^l). Repeating the 
process gives the desired decomposition. I 



The Cyclic Decomposition Theorem for Primary Modules 

The next step is to decompose primary modules. 



Theorem 6.5 (The cyclic decomposition theorem) Let M be a 
nonzero primary finitely generated torsion module over a principal ideal 
domain R, with order p®. Then M is the direct sum 

(6.2) M = Ci0-*-0C^ 

of cyclic submodules, with orders p^^,...,p^^ satisfying 



(6.3) e > ^2 - ’ ‘ 
or, equivalently, 

(6.4) p®n|p®n-l |•••|p®l 

Proof. Once (6.2) is established, (6.4) will follow easily, since 

p® G ann(M) C anw(C|) 

and so if ann(C|) = (a|), then 1 p®. Hence, = p^^ for some 
^ Then we may rearrange the order of the summands to get (6.4). 
To prove (6.2), we begin by observing that there is an element 
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Vj G M with ann(vj) = ann(M) = (p®). For if not, then for all v G M, 
we would have ann{\) — (p^) with k < e, and so p®“”^ G ann(M). But 
this implies that p® | p^”^, which is impossible. 

Our goal now is to show that the cyclic submodule (v^) is a 
direct summand of M, that is, 

(6.5) M = {y^)eS 

for some submodule S of M. Then since S is also a finitely 
generated primary torsion module over R, we can repeat the process, to 
get 

M = {Vi)©(v2)©S2 

where ann{w 2 ) — (P^^) e 2 < e^. Continuing in this way, we get 

an ascending sequence of submodules 

(vi>C(vi)®(v2)C--- 

which must terminate, since M satisfies the ascending chain condition 
on submodules. 

Thus, we need only establish (6.5). Since this is the most involved 
part of the proof, we will approach it slowly. Since M is finitely 
generated, we may write M = (v^,U 2 , . . . ,Uj^). Our argument will be by 
induction on k. If k = 0, then let S = {0}, and we are done. Assume 
that the result is true for k, and suppose that 

M = (vi,up...,u^,u) 

By the induction hypothesis, 

(Vi,Ui,...,uJ = (Vi)®So 
for some submodule Sq. 

Notice that we may replace u by any element of the form 
u — avj, for a G R, without effecting the span, that is, 

(VpUi,...,U^,U-0;Vj) = (VpUp...,U^,u) = M 

and so we seek an a G R for which 

(6.6) (vj)n(u-aVi,So) = {0} 
since then we would have 

M = (vj) © (u - avpSo) = (vj) © S 

Since any element of (u — avj,So) has the form r(u — avj) + SQ, 
equation (6.6) is equivalent to 

r(u-av^) -f Sq G (v^) r(u - av^) + Sq = 0 

for any r G R and Sq G Sq, or equivalently 
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r(u-av^) G (v;^)0Sq => r(u-av^)GSQ 

or, finally, 

(6.7) ruG(v;^)eSo ^ r(u-av^)GSo 
Now, we observe that the set 

3 = {r G R I ru G (vj) 0 Sq} 

is an ideal of R, and so it is principal, say 3 = (a). Note, however, 
that 

p®u = 0 G (v^) 0 Sq 

and so p® G (a), which implies that a | p®, or that a = p^, for some 
f < e. Thus, we have 

ru G (vj) 0SQ=^-rG3=>r = qp^ for some q G R 
=> r(u — av^) = qp^(u — av^) 
and so if we can find an a G R for which 

(6.8) p^(u - avj) G So 

then (6.7) will be satisfied. 

But p^ G 3 and so p^u G (v^) 0 Sq, say 

(6.9) P^U = tVj + Sq 

for some t G R and Sq G Sq. Equation (6.8) then becomes 

tvi + Sq - ap^Vj G Sq 
or 

(t - apVi € So 

and this happens if and only if t — ap^ = 0, that is, if and only if p^ 1 1, 
which is what we must show. 

Equation (6.9) implies that 

0 = p®""Vu = P^""^tv^ + p®“^Sq 

and since {vi}nSQ = {0}, we deduce that p^“^tv^ = 0. Therefore, 
since v^ has order p®, it follows that p® | p®“^t, that is, p^ 1 1, as 
desired. This completes the proof. | 

Uniqueness 

Although the decomposition (6.2) is not unique, we will see that 
the orders p ^ are unique up to multiplication by a unit. The prime p 
is certainly unique, since it must divide the order p^ of M. Before 
proceeding further, we need a few preliminary results, whose proofs are 
left as exercises. 
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Lemma 6.6 Let R be a principal ideal domain. 

1) If (v) is a cyclic R-module, with ann(v) = (a), then the map 

r:R-^(v) r(r) = rv 

is an epimorphism between R-modules, with kernel (a), and so 




Moreover, if a is a prime, then (a) is a maximal ideal in R, 
and so R/(a) is a field. 

2) Let p G R be a prime. If M is an R-module for which pM = 
{0}, then M is a vector space over R/(p), where scalar 
multiplication is defined by 

(r+(p))v = rv 

for all V G M. 

3) Let p G R be a prime. For any submodule S of an R-module 
M, the set 

= {v € S I pv = 0} 

is a submodule of M. Moreover, if M = S0T, then = 

s(p)©t(p). I 

Now we are ready for the uniqueness result. 

Theorem 6.7 Let M be a nonzero primary finitely generated torsion 
module over a principal ideal domain R, with order p®. Suppose that 

M = Cj 0 • • • 0 

where C- are nonzero cyclic submodules with orders p^h and 

e^ > * * • > Then if 

M = Dj 0 • • • 0 

f. 

where are nonzero cyclic submodules with orders p and 

f^ > • • • > fj^^, we must have n = m and 

6i = f^,...,ej^ — fj^ 

Proof. Let us begin by showing that n = m. According to part (3) of 
Lemma 6.6, 

m(p) = c[p) 0 • • • © C|P^ 

and 

m(p) = d(p)0---0D|p) 

where each summand in both decompositions is nonzero. (Why?) 
Since pM^^^ = {0} by definition. Lemma 6.6 implies that is a 
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vector space over R/(p), and so each of the preceding decompositions 
expresses m(p) as a direct sum of one-dimensional subspaces. Hence, 
m = n. 

Next, we show that the exponents ej and f- are equal by using 
induction on e^. Assume first that e^ = 1, in which case = 1 for 
all i. Then pM = {0}, and so fj = 1 for all i, since if f^ > 1, and if 
Dj = (w), then pw ^ 0, which is not the case. 

Now suppose the result is true whenever e^ < k — 1, and let e^ = 
k. To isolate the exponents that equal 1, suppose that 

(ei,...,ejj) = (e^, . . . , eg,l, . . . , 1), > 1 

and 

= ft>i 

Then 



and 



pM = pC^ 0 • • • 0 pCg 
pM = pD| 0 • • • 0 pD^ 



6’ ~ X 

But pCj is a cyclic submodule of M, and ann(pC|) = (p ^ }. To see 

this, suppose that Cj = (v^). Then 

pCj = {pc I c e CJ = {prvj I r € R) = {r(pv;) | r G R} = (pvj) 

1 

and pvj has order p^^ Similarly, pDj is a cyclic submodules of 
M, with awn(pD|) = (p ^ ). In particular, a7iri(pC2) = (p ^ )? and so, 

by the induction hypothesis, we have 

s = t and ej = fj,...,eg = fg 

which concludes the proof. I 



The Cyclic Decomposition Theorem 

Let us pause to see where we stand. If M is a finitely generated 
module over a principal ideal domain then, according to Theorem 6.3, 



^ = ^tor®^{ree 



where is the submodule of all torsion elements and is a 

free submodule of M. If has order 



where the p^’s are distinct primes, the primary decomposition theorem 
implies that 



Mtor = Mp^' 



iM, 
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where is a primary module, with order pj K Hence, 

M = Mp^e---©Mp^©Mfr,, 

Finally, the cyclic decomposition theorem for primary modules allows 
us to write each primary module Mp as a direct sum of cyclic 
submodules. Let us put the pieces together in one theorem. 

Theorem 6.8 (The cyclic decomposition theorem for finitely generated 
modules over a principal ideal domain — elementary divisor version) 

Let M be a nonzero finitely generated module over a principal ideal 
domain R. Then 



where is the set of all torsion elements in M, and is a 

free module, whose rank is uniquely determined by the module M. If 
has order 

where the p-’s are distinct primes, then is the direct sum 

'^tor - ® • • • ® Mp^ 

where 

Mp.=:{vGM|pfiv^O} 

is a primary submodule, with order pp. 

Moreover, each Mp can be further decomposed into a direct sum 
of cyclic submodules ^ 

Mp. = Ci,i®---®Ci,k. 

e- • 

with orders and where 



e* • 

The orders called the elementary divisors of M, are uniquely 

determined, up to multiplication by a unit, by the module M. 

This yields the decomposition of M into a direct sum of cyclic 
submodules (and a free summand) 

(6.10) M = 0 • • • 0 j^^) 0 0 (Cj^ 1 0 • • • 0 1^^) 0 I 



The decomposition of M can be formulated in a slightly different 
way by observing that if S and T are cyclic submodules of M, and if 
ann{S) = (a) and ann{T) = (b) where gcd(a,b) = 1, then S flT = 
{0} and S 0 T is a cyclic submodule with ann(S 0 T) = (ab). 

With this in mind, the summands in (6.10) can be collected as 
follows. Let D| be the direct sum of the first summands in each group 
of summands in (6.10) (by a group of summands, we mean the 
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summands associated with a given prime pj). Thus, 

This cyclic submodule has order 

'll = 

i 

Similarly, let D 2 be the direct sum of the second summands in each 
group in (6.10). (If a group does not have a second summand, we 
simply skip that group.) This gives us the following decomposition. 

Theorem 6.9 (The cyclic decomposition theorem for finitely generated 
modules over a principal ideal domain — invariant factor version) Let 

M be a finitely generated module over a principal ideal domain R. 
Then 

where is a free submodule, and is a cyclic submodule of M, 

with order q^, where 

V I V-l> V-1 I 1m-2> • • • . Q2 I qi 

Moreover, the scalars q^, called the invariant factors of M, are 
uniquely determined, up to multiplication by a unit, by the module M. 
Also, the rank of is uniquely determined by M. I 



EXERCISES 

1. Referring to the proof of Theorem 6.1, why may we assume that 
M = R^? 

2. Provide an example of an R-module M in which, for some 
0 ^ r G R and 0 u G M, we have ru = 0. 

3. Show that, for any module M, the set of all torsion 

elements in M is a submodule of M. 

4. Show that, for any module M the quotient module is 

torsion free. 

5. Show that any free module over a principal ideal domain is torsion 
free. 

6. Referring to the proof of Theorem 6.3, show that is linearly 
independent. 

7. Let M be an R-module. Show that ann{\) and ann{M) are ideals 
of R. 

8. Let M be a module over a p.i.d. R. If fi and 1 / are both 
orders of M, show that fi and 1 / are cissociates. 

9. What is the order of the zero element in a module? What is the 
order of 1 G M? 




6 Modules over P.I.D.’s 



119 



10. Let R be a principal ideal domain. Show that the ideal (p,q) 
generated by p and q is also the ideal generated by gcd{p,q}. 

11. Let M be an R-module. Prove that ann(M) C ann(v), for any 
V E M. Furthermore, when R is a principal ideal domain, and 
ann(v) = {y) and ann{M) = (/i), then u | //. 

12. Prove Lemma 6.6. 

13. Show that if S and T are cyclic submodules of M, and if 
ann(S) = (a) and ann(T) = (b) with gcd(a,b) = 1, then 
S n T = {0} and S 0 T is a cyclic submodule with ann(S 0 T) = 
(ab). Hint: use the fact that there exists p,q E R such that 
pa 0 qb = 1. 

14. Show that ann(M) C ann(v), for any v E M. 

15. Show that, when R is a principal ideal domain, and ann(v) = 
(u) and ann(M) = (//), then i/ 1 //. In words, an order of v E M 
divides an order of M. 




CHAPTER 7 

The Structure of a Linear Operator 



Contents: A Brief Review. The Module Associated with a Linear 

Operator. Submodules and Invariant Subspaces. Orders and The 
Minimal Polynomial. Cyclic Submodules and Cyclic Subspaces. 
Summary. The Decomposition of V. The Rational Canonical Form. 
Exercises. 



In this chapter, we study the structure of a linear operator on a finite 
dimensional vector space, using the powerful module decomposition 
theorems of the previous chapter. Unless otherwise noted, all vector 
spaces will be assumed to be finite dimensional. 



A Brief Review 

We have seen that any linear operator on a finite dimensional 
vector space can be represented by matrix multiplication. Let us 
restate Theorem 2.13 for linear operators. 

Theorem 7.1 Let r G A(V), and let ^ = (b^,...,bj^) be an ordered 
basis for V. Then r can be represented by a linear operator 
Ta G «t(F^), that is, 

['T(v)]gj = 

where A = [r]g^ is the matrix whose ith column is [r(b|)]g^. Thus, 

[^(v)]ga = Mgg [v]^ 



I 
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Since the matrix [r]g^ depends on the ordered basis it is 
natural to wonder how to choose this basis in order to make the matrix 
[r]g^ as simple as possible, and that is the subject of this chapter. 

Let us also restate the relationship between the matrices of r 
with respect to different ordered bases. 

Theorem 7.2 Let r G i'(V), and let and be ordered bases 
for V. Then the matrix of r with respect to ?B' can be expressed in 
terms of the matrix of r with respect to ^B as follows 

where eg/ is the change of basis matrix, whose ith column is 

[b;]gj„ where = (bp...,bj. I 

Finally, we recall the definition of similarity, and its relevance to 
the current discussion. 

Deflnition Two matrices A and B are similar if there exists an 
invertible matrix P for which 

B = PAP-^ 

The equivalence classes associated with similarity are called similarity 
classes. 0 

Theorem 7.3 The following statements are equivalent for matrices A 
and B. 

1) If A represents a linear operator r:V— with respect to an 
ordered basis ^B, then B also represents r, but perhaps with 
respect to a different ordered basis. That is, if 

A = [t]^ 

then there exists an ordered basis ?B' for which 

B = [r]^, 

2) A and B are similar. I 

According to Theorem 7.3, the matrices that represent a given 
linear operator r G l(V) are precisely the matrices that lie in a 
particular similarity class. Hence, in order to best represent r, we seek 
a simple representative of that similarity class. More generally, in order 
to represent all linear operators on V, we would like to find a simple 
representative of each similarity class, that is, a set of simple canonical 
forms for similarity. 
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Let us recall the definition of canonical form. 

Definition Let be an equivalence relation on S. A subset C C S is 
said to be a set of canonical forms for ~ if for every s 6 S, there is 
exactly one c G C such that c s. D 

Now, the simplest type of matrices are probably the diagonal 
matrices. However, not all linear operators can be represented by 
diagonal matrices. In other words, the set of diagonal matrices does not 
form a set of canonical forms for similarity. 

This gives rise to two different directions for further study. First, 
we can search for a characterization of those linear operators that can 
be represented by diagonal matrices. Such operators are called 
diagonalizable. Second, we can search for a different type of “simple” 
matrix that does provide a set of canonical forms for similarity. We 
will pursue both of these directions at the same time. 



The Module Associated with a Linear Operator 

Throughout this chapter, we fix a nonzero linear operator 
r G «f(V), and think of V not only as a vector space over a field F, 
but also as a module over F[x] (as described in Chapter 4), with scalar 
multiplication defined by 

p(x)v = p(r)(v) 

Our plan is to translate the language of the previous chapter into the 
language of V, by relating module concepts and vector spaces concepts. 

First, since V is a finite dimensional vector space, the module V 
is a torsion module. To see this, observe that the vector space L(V), 
being isomorphic to has dimension n^. Hence, the n^ + 1 

vectors ^ 

j • • • ? ^ 

are linearly dependent, which implies that p(r) = 0 for some 
polynomial p(x) G F[x]. Hence, p(x)V = {0}, and so all elements of 
V are torsion elements. 

Also, V is finitely generated as a module. For if ^ = 
{vj,...,Vj^} is a basis for the vector space V, then every vector v G V 
is a linear combination 

where r^ G F C F[x], and so *35 generates the module V. 

Hence, V is a finitely generated torsion module over a principal 
ideal domain F[x], and so we may apply the decomposition theorems of 
the previous chapter. 
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Submodules and Invariant Subspaces 

There is a simple connection between the submodules of the 
module V and the subspaces of the vector space V. Recall that a 
subspace S of V is invariant under r if r(S) C S. 

Theorem 7.4 A subset S of V is a submodule of the F[x]-module V 
if and only if it is an invariant subspace of the vector space V. I 

Theorem 7.4 raises an issue that we should address. Namely, a 
submodule S of V can be made into an F[x]-module in two ways — as 
a submodule of V, and as a module using the restriction r | g:S-^S of 
r to S. However, since 

p(r)(s) = p(r I s)(s) 

for all 8 6 S, scalar multiplication is the same in both cases, and so 
these two modules are identical. 



Orders and the Minimal Polynomial 

Next, consider the annihilator of V 

ann(V) = {p(x) € F[x] | p(x)V = {0}} 

which is a nonzero principal ideal of F[x]. Since all orders of V (that 
is, generators of ann(V)) are associates, and since the units of F[x] are 
precisely the nonzero elements of F, there is a unique monic order 
of V. This leads to the following definition. 



Definition The unique monic order of the module V, that is, the 
unique monic polynomial that generates ann(V), is called the minimal 
polynomial for r. We denote this polynomial by m^(x), or mm(r). 
Thus, 



and 



ann(V) = {m^(x)) 



p(x)V = {0} if and only if m^(x) | p(x) 



or, equivalently 



p(r) = 0 if and only if m^(x) | p(x) 



D 



In treatments of linear algebra that do not emphasize the role of 
the module V, the minimal polynomial of a linear operator r is 
simply defined as the unique monic polynomial m^(x) of smallest 
degree for which = 0. It is not hard to see that this is 

equivalent to our definition. 

The connection between order and minimal polynomial carries 
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over to submodules as well. 

Theorem 7.5 Let S be a submodule of the module V. Then the 
monic order of S is the minimal polynomial of the restriction r | g. 

Proof. This follows from the fact that, if q(x) is the monic order of 
S, then 

(q(x)) = ann(S) = {p(x) | p(x)S = { 0 }} = {p(x) | p(r | g)(S) = { 0 » 
and so q(x) is the minimal polynomial of the restriction r | g. I 

The concept of minimal polynomial is also defined for matrices. 
In particular, if A is a square matrix over F, the minimal polynomial 
m^(x) of A is the unique monic polynomial p(x) E F[x] of smallest 
degree for which p(A) = 0. We leave it to the reader to verify that this 
concept is well-defined, and that the following holds. 

Theorem 7.6 

1) If A and B are similar matrices, then m^(x) = msW- Thus, 
the minimal polynomial is an invariant under similarity. 

2) The minimal polynomial of r G i'(V) is the same as the minimal 
polynomial of any matrix that represents r. I 

Cyclic Submodules and Cyclic Subspaces 

Consider the cyclic submodule 

(v) = {p(x)v I p(x) g F[x]} 

and suppose that it has monic order m(x). Thus, m(x) is the 
minimal polynomial of the restriction a = r \ If 

m(x) = aQ -h a^x H h -f x*' 

then 

S = (v,xv,...,x"~^v) = (v,<r(v),...,cr"~^(v)) 

is an ordered basis for the vector space (v). To see that % is linearly 
independent, suppose there exist nonzero scalars for which 

rgV -h r^xv -f h ® 

that is, 

(ro + rjx + • • • + r^_ix""^)v = 0 

Then 

(fo + riX + • • • + r„_,x"-^)(v) = {0} 

and so m(x) | Fq + r,x H h which implies that Fj = 0 for 
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all i = 0,...,n-l. 

To see that spans (v), observe that all elements of (v) have 
the form p(x)v, for some polynomial p(x) 6 F[x]. However, dividing 
p(x) by the minimal polynomial m(x) gives 

p(x) = q(x)m(x) + r(x) 

where deg r(x) < deg m(x). Since m(x)v = 0, we have 
p(x)v = q(x)m(x)v + r(x)v = r(x)v 

which shows that all elements of (v) have the form r(x)v, where 
deg r(x) < deg q(x) = n. In symbols, 

(v) = {r(x)v I deg r(x) < deg m(x)} 

Hence, if 

r(x) = To + r^x + • • • + 

we have 

r(x)v = rQV + r^xv H f- G span{%) 

Thus, is an ordered basis for (v). 

To determine the matrix [cr]g^ of a with respect to S, observe 

that 

a((ri(v)) = <ri+l(v) 

for i = 0,...,n-2, and so a simply “shifts” each basis vector in *35, 
except the last one, to the next basis vector in Also, m(cr) = 0 

implies that 

<r(<r"-l(v)) = (t'‘(v) 

= - (ao + aj(T + • • • + ajj_i<r"~^)(v) 

= - a^v - aio-(v) aj^_i<T"-l(v) 

Hence, the matrix of (7, with respect to ^B, is 



C[m(x)] = 



0 

1 

0 



0 

0 

1 



0 0 



0 -aQ 

0 -a^ 



^ ^n— 1 



This matrix is known as the companion matrix for the polynomial 
m(x) = a^ + ajX + f- ajj_jx"“^ + x** 
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Note that companion matrices are defined only for monic polynomials. 
Let us summarize, beginning with a definition. 

Definition Let r 6 ^(V). A subspace S of V is r-cyclic if there 
exists a vector v G S for which the set 

is a basis for S, where m = dim{S), D 

Theorem 7.7 

1) A subset S C V is a cyclic submodule of V if and only if it is a 
r-cyclic subspace of V. 

2) Suppose that (v) is a cyclic submodule of V. If the monic order 
of (v) (that is, the minimal polynomial of cr = r | is 

m^(x) = aQ + ajx H h a,j_jx”“^ + x” 

Then 

= (v,xv, . . . ,x"-lv) = (v,a(v), . . . , 

is an ordered basis for (v), and the matrix [(r]g^ is the 
companion matrix C(m^(x)) of m^(x). I 



Summary 

The following table summarizes the connection between the 
module concepts and the vector space concepts that we have discussed. 



Module V 


Vector Space V 


Scalar multiplication: p(x)v 


Action of r: p(r)(v) 


Submodule 


Invariant subspace 


Annihilator: 

ann(v) = {p(x) | p(x)V = {0}} 


Annihilator: 

ann(V) = {p(x) | p(r)(V) = {0}} 


Monic order m(x) of V: 
ann(V) = {m(x)) 


Minimal polynomial of r: 
m(x) is poly, of smallest degree 
for which m(r) = 0 


Cyclic submodule: 

(v) = {p(x)v 1 p(x) 6 F[x]} 


r-cyclic subspace: 

(v) = span{v,r(v), . . . , r™"^(v)} 
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The Decomposition of V 

We are now ready to translate the cyclic decomposition theorem 
(Theorem 6.8) into the language of V. 



Theorem 7.8 (The cyclic decomposition theorem for V) Let 

T G i(V), where dim(V) < oo. If the minimal polynomial of r is 

“rW =P^W-"Pn”W 



where the monic polynomials Pj(x) are distinct and irreducible, then 
V is the direct sum 



where 



V = V 



Pi 



0“*0V^ 



Vp. = {v e V I pfH^)(v) = 0} 



is an invariant subspace (submodule) of V, and 

min{r | y ) = P^(^) 

Pi 

Moreover, each Vp. can be further decomposed into a direct sum 
of r-cyclic subspaces (cyclic submodules) 

Vp. = {vi,i)e---e(vik.) 

where 

6* • 

I /v. .)) = Pi*’^W 

and ' 

~ ®i,l — ®i,2 — ■ ■ ■ ^ ®i,kj 
e* • 

The elementary divisors Pj^’^(x) of V, also known as the elementary 
divisors of r, are uniquely determined by the operator r. 

This yields the decomposition of V into the direct sum of 
r-cyclic subspaces 



V = ({Vi,i> 0 • • • e {Vj ,^^)) © ... 0 ((v„ 0 ... 0 (v„ 1^^)) 



I 



The Rational Canonical Form 

The cyclic decomposition theorem can be used to determine a set 
of canonical forms for similarity. 

Recall that if V = S 0 T and if both S and T are invariant 
under r, the pair (S,T) is said to reduce r. Put another way, (S,T) 
reduces r if the restrictions 

r I g:S— and r | j.-T-^T 
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are linear operators on S and T, respectively. Recall also that we 
write r = p 0 cr if there exist subspaces S and T of V for which 
(S,T) reduces r and 

p = r I g and cr = r | ^ 

If r = cr 0 p, then any matrix representations of cr and p can be 
used to construct a matrix representation of r. This is especially 
relevant to our situation, since according to Theorem 7.8, 



(7.1) 



T — T \ 



<’' 1,1 >' 



<"n,k ) 



Before discussing this further, it will be convenient to introduce 
the notational device of a block matrix. If A is an n x n matrix, and 
B is an m x m matrix, then by the block matrix 



M = 



A 

0 



0 

B 



-* block 



we mean the (n+m) x (n+m) matrix whose upper left n x n submatrix 
is A, and whose lower left m x m submatrix is B. (Thus, A and B 
are submatrices of M, and not entries.) All other entries in M are 0. 
Because of the particular block form of M, we also refer to it as a block 
diagonal matrix. Clearly, this concept can be extended to more than 
two matrices A and B. 



Theorem 7.9 Suppose that r = T 2 E with corresponding 

reducing pair (S,T). Let C = (cj,...,cJ be an ordered bftsis for S, 
let ^ = (dj, . . . , d^^) be an ordered basis for T, and let 

IB = (c,,...,C5,di,...,dJ 

be the corresponding ordered basis for V. Then the matrix [r]g^ has 
the block diagonal form 






[Ti](o 0 



0 






block 



Of course, this theorem may be extended to apply to multiple 
direct summands. In particular, referring to (7.1), if S— is an ordered 
basis for the cyclic submodule (v^ j), and if 
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denotes the ordered basis for V, obtained (as in Theorem 7.9) from 
these ordered bases, then 






1,1 









K 



block 



where t; ^ = r | ^ v. 

e* • 

Now, the cyclic submodule (v— } has monic order pj'’^(x), that 
is, the restriction r— has minimal polynomial p-^’^(x). Thus, if 



then 



deg Pi*J(x) = dj 



IJ 

d: .-1 




is an ordered basis for Hence, we arrive at the matrix 

representation of r described in the following theorem. 



Theorem 7.10 Let dim{Y) < oo, and suppose that r 6 -t(V) has 
minimal polynomial 

mrW = p!Hx)---Pn"W 

where the monic polynomials Pi(x) are distinct and irreducible. Then 
we can write 



V = ((vi^i) e • • • e 0 • • • © ((v^ i) 0 • • . 0 

where (v- j} is a r-cyclic subspace of V. The minimal polynomials for 
• = r I / V are the elementary divisors 
' jj' e. . 

mm(rjj) = Pi**^(x) 

of V, where 

~ ^ ^i,2 ^ ^ ^i,kj 

These elementary divisors are uniquely determined by r. Furthermore, 
if deg Pjj^’^(x) = d|j , then 



is an ordered basis for (v|j), and the matrix of r with respect to the 
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ordered basis 






is the block diagonal matrix 



c[p^’Hx)] 



C[Pi^’''Kx)] 



(7.2) [r]^ = 



C[p,’“Kx)] 



C[p>"“(x)] 



block 



The matrix on the right is called the rational canonical form of r. I 
Let us denote the matrix on the right of (7.2) by 
rfia5(c[pil’^(x)], . . . , C[p„"’*'n(x)]) 

Theorem 7.10 implies that, for any r G L(V), we can find an ordered 
basis ^ for which the matrix [rjcrs has the rational canonical form 
(7.2). On the other hand, r has only one rational canonical form (up 
to reordering of the blocks on the diagonal). To see this, suppose that, 
for some ordered basis C for V, the matrix [r]^ has the form 

Me = '^*«5(c[qI^’Hx)],...,c[qr’''*"(x)]) 

Then V can be written as a direct sum of cyclic submodules, with 
elementary divisors q-^’J(x). Hence, the uniqueness of the rational 
canonical form (up to reordering of the blocks on the diagonal) follows 
from the uniqueness of the cyclic decomposition of V. 

Theorem 7.10 can be reformulated in terms of matrices as follows. 

Theorem 7.11 Any square matrix A is similar to a unique (except for 
the order of the blocks on the diagonal) matrix that is in rational 
canonical form. I 
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Corollary 7.12 Two matrices over the same field F are similar if and 
only if they have the same elementary divisors. I 

We will not go into the details of how best to find the rational 
canonical form of a matrix, since our main interest in this form is as a 
theoretical tool. However, for concreteness, here are some examples of 
rational canonical forms. 

Example 7.1 Let r be a linear operator on the vector space R^, and 
suppose that r has minimal polynomial 

m^(x) = (x-l)(x2 + l)2 

Noting that x — 1 and (x^ + 1)^ are elementary divisors, we have the 
following possibilities for the list of elementary divisors. 

1) X- 1, (x^ + 1)^, x^-f 1 

2) X — 1, X — 1, X — 1, (x^ + 1)^ 

These correspond to the following rational canonical forms 



-1 


0 


0 


0 


0 


0 


0 


2) 


-1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


-1 


0 


0 




0 


-1 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 




0 


0 


-1 


0 


0 


0 


0 


0 


0 


1 


0 


-2 


0 


0 




0 


0 


0 


0 


0 


0 


-1 


0 


0 


0 


1 


0 


0 


0 




0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


-1 




0 


0 


0 


0 


1 


0 


-2 


0 


0 


0 


0 


0 


1 


0 




0 


0 


0 


0 


0 


1 


0 



EXERCISES 

1. Show that a subset S of V is a submodule of the F[x]-module 
V if and only if it is an invariant subspace of the vector space V. 

2. Show that the units in F[x] are precisely the nonzero scalars in 

F. 

3. Verify that the concept of the minimal polynomial of a matrix is 
well-defined. Prove Theorem 7.6. 

4. We have seen that any r G i^(V) can be used to make V into 
an F[x]-module. Does every F[x]-module V come from some 
T G Jt(V)? Explain. 

5. Formulate an invariant factor version of Theorem 7.10. 

6. Referring to the discussion immediately following Theorem 7.10, 
show that the rational canonical form of r is unique, up to the 
order of the block diagonal matrices. 
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7. Prove that the minimal polynomial of r G i^(V) is the legist 
common multiple of its elementary divisors. 

8. Let Q be the field of rational numbers. Consider the linear 
operator r G L(Q^) defined by r(ej) =: 62, T(e2) = -e^. 

a) Find the minimal polynomial for r, and show that the 
rational canonical form for r is 




What are the elementary divisors of r? 

b) Now consider the map a G «L(C^) defined by the same rules 

as r, namely, = 62, cr(e2) = -e^. Find the minimal 

polynomial for (T, and the rational canonical form for cr. 
What are the elementary divisors of (t? 

c) The invariant factors of r are defined, using the elementary 
divisors of r, in the same way as we did at the end of 
Chapter 6, for a module M. Describe the invariant factors for 
the operators in parts (a) and (b). 

9. Find all rational canonical forms (up to the order of the blocks on 

the diagonal) for a linear operator on having minimal 

polynomial (x — l)^(x + 1)^. 

10. How many possible rational canonical forms (up to order of 
blocks) are there for linear operators on R^, with minimal 
polynomial (x — l)(x + 1)^? 

11. Prove that if C is the companion matrix of p(x), then p(C) = 
0, and C has minimal polynomial p(x). 

12. Let r be a linear operator on F^, with minimal polynomial 
m^(x) = (x^ -f l)(x^ — 2). Find the rational canonical form for r 
if (a) F=:Q, (b) F=:R, (c) F = C. 




CHAPTER 8 

Eigenvalues and Eigenvectors 



Contents: The Characteristic Polynomial of an Operator, Eigenvalues 
and Eigenvectors. The Cayley-Hamilton Theorem. The Jordan 
Canonical Form. Geometric and Algebraic Multiplicities. 
Diagonalizable Operators. Projections. The Algebra of Projections. 
Resolutions of the Identity. Projections and Diagonalizability. 
Projections and Invariance. Exercises. 



Unless otherwise noted, we will assume throughout this chapter that all 
vector spaces are finite dimensional. 



The Characteristic Polynomial of an Operator 

Let us compute the determinant of the matrix xl — R, where R 
is the rational canonical form of r. To do this, we need the following 
result, whose proof is left to the reader. 

Lemma 8.1 If a square matrix M has the block diagonal form 






A 

0 



0 

B 



block 



where A and B are square, then det(M) = det(A)det(B). I 



Now, let C[p(x)] be the companion matrix of the polynomial 
p(x) = aQ -f a^x H h + x^, and let 




136 



8 Eigenvalues and Eigenvectors 



X 0 



A = xl - C[p(x)] = 



-1 

0 



X 

-1 



0 ag 
0 aj 




X a. 



'n-2 



-1 x+a^_J 



To compute the determinant of this matrix, we indicate its dependence 
on the coefficients a^ by writing A = A(x;aQ, ...,aj^_j), and then look 
at some simple cases 



det(A(x;aQ,a^)) 



X ag 

-1 x+a^ 



x(x + aj) + ag = ag + a^x + x^ = p(x) 



det(A(x;ag,ai,a2)) = 



X 0 ag 
-1 X a^ 

0 -1 x+a2 



= X 



X ai 



-1 x+a2 






-1 X 
0 -1 



= x[x(x + a 2 ) + a^] + ag = ag + a^x + a 2 X^ + x^ = p(x) 
In general, expanding along the first row gives 

det(A(x,ao, . . . , = x det(A(x,ap . . . , a„_j)) + (-l)"+l(-l)n-lao 

= X det(A(x,ai, . . . ,an_x)) + »(, 

An induction argument thus leads to the following. 

Lemma 8.2 If C[p(x)] is the companion matrix of a polynomial p(x), 
then 

det(xl - C[p(x)]) = p(x) I 

Combining Lemmas 8.1 and 8.2 gives the following. 

Theorem 8.3 If R is the rational canonical form for r £ L(V), then 

C^(x) = det(xl “ R) = n W 
ij 

This determinant is called the characteristic polynomial of r. I 
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The characteristic polynomial is often defined first for matrices 
and then for linear operators. The characteristic polynomial of a square 
matrix A is defined to be C^(x) = det(xl — A). 

Theorem 8.4 

1) If A is similar to B, then C^^(x) = C 3 (x). Thus, the 
characteristic polynomial is an invariant under similarity. 

2) The characteristic polynomial of an operator r is equal to the 
characteristic polynomial of any matrix that represents r. 

3) The characteristic polynomial of an operator r is the product of 
the elementary divisors of r. I 



Even though the characteristic polynomial is an invariant under 
similarity (as is the minimal polynomial), the matrices 




and 



B = 



S 0 
1 6 



which have the same characteristic polynomial but are not similar, 
show that the characteristic polynomial is not a complete invariant. 



Eigenvalues and Eigenvectors 

Notice that A G F is a root of the characteristic polynomial 
C^(x) of a linear operator r G (V) if and only if 

(8.1) det(AI-R) = 0 



that is, if and only if the matrix AI — R is singular. 
dim(y) = d, then the rational canonical form R for 7 



In particular, if 
has size d x d. 



and so (8.1) holds if and only if there exists a nonzero vector 
for which 

(AI - R)x = 0 
or 

Tr(x) = Ax 



xeF° 



If V G V is the nonzero vector for which [v]gt» = x, where % is the 
ordered basis used to represent r by R, then tms is equivalent to 



r(v) = Av 

This prompts the following definition, which applies to vector spaces of 
arbitrary dimension. 



Definition Let r G i(V) be a linear operator. A scalar A G F is an 
eigenvalue (or characteristic value) of r if there exists a nonzero vector 
V G V for which 
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r(v) = Av 

In this case, v is an eigenvector (or characteristic vector) of r, 
associated with A. 

If A is a matrix over F, then A G F is an eigenvalue for A if 
there exists a nonzero column vector x for which 

Ax = Ax 

In this case, x is an eigenvector (or characteristic vector) for A, 
associated with A. D 

The set of all eigenvectors associated to a given eigenvalue A, 
together with the zero vector, forms a subspace of V, called the 
eigenspace of A. We will denote the eigenspace of an eigenvalue A by 
6;^. This applies to both linear operators and matrices. 

The following theorems summarize the key facts about eigenvalues 
and eigenvectors. 

Theorem 8.5 

1) A scalar A G F is an eigenvalue of r G i'(V) if and only if it is a 
root of the characteristic polynomial C^(x) of r. 

2) A scalar A G F is an eigenvalue of r G «t(V) if and only if it is a 
root of the minimal polynomial m^(x) of r. 

3) A scalar A G F is an eigenvalue of r if and only if it is an 
eigenvalue of any matrix that represents r. 

4) The eigenvalues of a matrix are invariants under similarity. 

5) If A is an eigenvalue for a matrix A, then the eigenspace 6;^ is 
the solution space to the homogeneous system of equations 

(AI — A)(x) = 0 

Proof. The first part of this theorem has already been established. 
Part (2) follows from the fact that the prime factors of the 
characteristic polynomial C^(x) and the minimal polynomial m^(x) 
are the same. 

As for part (3), A is an eigenvalue of r if and only if 
(8.2) r(v) = Av 

for some nonzero v G V. Now, suppose that dim(y) = d, let be an 
ordered basis for V, and let be the isomorphism defined by 

<^g^(u) = [u]<^. Then, if A = [rj^, we have (cf. Figure 3.2) 

and so (8.2) is equivalent to 
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or 

which says that A is an eigenvalue for A. Hence, A is an eigenvalue 
for r if and only if it is an eigenvalue for A. This proves part (3). 
Part (4) follows from part (3), and part (5) is evident. I 

Theorem 8.6 Suppose that Aj,...,Aj^ are the distinct eigenvalues of a 
linear operator r G i'(V). Then 6;^ fl 8;^. {0}. Moreover, 

eigenvectors associated with distinct eigenvalues are linearly 
independent. That is, if V| G 8;^ for i = l,...,k, then the vectors 
{ Vj , . . . , Vj^} are linearly independent. 

Proof. We leave it to the reader to show that H 8^^ = {0}. Let 
Vi G 8^ , for i = l,...,k, where A^,...,Aj^ are distinct eigetivalues of r. 
We want to show that the Vj’s are linearly independent. Assuming 
that the Vi’s are linearly dependent, we may also assume (after 

renumbering if necessary) that, among all nontrivial linear 
combinations of these vectors that equal 0, the equation 

(8.3) riVj + . . . + TjVj = 0 

is the shortest such equation (that is, hcis the fewest terms). Applying 
r gives 

ridvi) + --- + rjdvj) = 0 
or 

(8.4) riA^Vi+--- + rjAjVj = 0 

Now we multiply (8.3) by A^, and subtract from (8.4), to get 

f2('^2 - '^ 1)^2 + • • • + - ^i)Vj = 0 

But this is a shorter equation than (8.3), and so all of the coefficients 
must equal 0, and since the A|’s are distinct, we deduce that r^ = 0 
for i = 2,...,j, and so r^ = 0 as well. This contradiction implies that 
the Vj’s are linearly independent. I 

One way to compute the eigenvalues of a linear operator r is to 
first represent r by a matrix A, and then solve the characteristic 
equation 

Ca(x) = 0 

Unfortunately, however, it is quite likely that we cannot solve this 
polynomial equation when deg C^(x) = dim{\) >3. As a result, the 
art of approximating the eigenvalues of a matrix is a very important 
area of applied linear algebra. 
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The Cayley-Hamilton Theorem 

Since the characteristic polynomial C^(x) of a linear operator r 
is the product of its elementary divisors, and since the minimal 
polynomial of r is the. product 

mrW = plHx)---Pn"(x) 

we deduce that m^(x) | C^(x). This important result is referred to as 
the Cayley-Hamilton theorem. 

Theorem 8.7 Let r G i'(V). 

1) The minimal polynomial nirW characteristic 

polynomial C^(x) have the same prime factors. 

2) (The Cayley-Hamilton theorem) m^(x) | C^(x), or equivalently, 
r satisfies its own characteristic polynomial. I 



The Jordan Canonical Form 

One of the virtues of the rational canonical form is that every 
linear operator r on a finite dimensional vector space has di rational 
canonical form. That is, the set of all matrices in rational canonical 
form constitutes a set of canonical forms (at least up to the order of the 
blocks on the diagonal). Unfortunately, however, the rational canonical 
form of a matrix may be far from the ideal of simplicity that we had in 
mind for a set of simple canonical forms. 

Fortunately, in certain important cases, we can do better than the 
rational canonical form. In particular, let us consider the case of a 
linear operator r G «L(V) whose minimal polynomial factors into a 
product of linear factors 

(8.5) m^(x) = (x - Ai)®i. • .(x - 

When a polynomial factors into a product of linear factors over a field 
F, we say that the polynomial splits over F. 

To put this in perspective, we note that a field F is said to be 
algebraically closed if every nonconstant polynomial over F has a root 
in F. Thus, the only irreducible polynomials over an algebraically 
closed field are the linear polynomials, and so any nonconstant 
polynomial over F splits over F. For example, the complex numbers 
C form an algebraically closed field, and so any linear operator over a 
complex vector space has minimal polynomial that splits over C. 

In some sense, the “weakness” in the rational canonical form 
comes in choosing the basis for the cyclic submodules whose 

monic order is the elementary divisor P|^’^(x), of which we know very 
little in general. Recall that, since {v—) is a r-cyclic subspace of V, 
we have chosen the ordered basis 
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However, when the minimal polynomial has the form (8.5), then 
the elementary divisors have the form 

In this case, we can make a more judicious choice of ordered basis. 
Observe that dim{{v-^^)) = deg P|^»^(x), and so it is easy to see that the 
set ^ 

Cij = (vi j,(Ti j - Ai)(Vi .), . . . , (Ti j (Vi j)j 

is an ordered basis for Furthermore, denoting the kth basis 

vector in C-, • by b. , we have for k = 0, . . . , e; --2, 

^ij(bk) = ^i j[(^ij - ^i)*'('^i j)] = (^i j - ^i + ^i)[(^i j - '^i)’'(vi j)] 

= (^i j - j) + Ai(T-i j - Ai)'‘(V; j) = bk+l + Aibfc 



For k = e- j — 1, a similar computation, using the fact that 
(^i j - ^i)*'^'(''i j) = (^i j - ^i)"'"(vi j) = 0 

gives 

Hence, the matrix of Tjj = r | ^ is the e— xe— matrix 

Aj 0 0 

1 A, : 

0 1 *•. : 

: *•. *•. 0 

0 • • • 0 1 Ai 

This matrix is referred to as a Jordan block associated to the scalar A|. 
Note that a Jordan block has A^’s on the main diagonal. Is on the 
subdiagonal, and Os elsewhere. 

Now we can state the analog of Theorem 7.10 for this new choice 
of ordered basis. 



KAj^^i j) — 



Theorem 8.8 Suppose that the minimal polynomial of an operator 
T G i'(V) splits over the base field F, that is, 

mr(x) = (x - AJ®1- • -(x - AJ®»' 
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Then we can write 

V = ((vi^i) e • • • e (vj k^)) e • • • 0 ((v„ e • • • e (v„^k^}) 

where (v- •) is a r-cyclic subspace of V. The minimal polynomials for 



ij 

r- • = r I / V are the elementary divisors 



min\ 



j) = (x - ^i) 



IJ 



of V, where 



^i "" ^i,l — %2 ^ ^ ^i,kj 



These elementary divisors are uniquely determined by r. Furthermore, 
the set 



( 8 . 6 ) 








is an ordered basis for (v^j), and the matrix of r with respect to the 
ordered basis 



is the block matrix 









Hc = 






K'^n’^n,k^^) 

K^^n’^n,k^) 



block 



The matrix on the right is called the Jordan canonical form of r. I 

We leave it to the reader to show that, for an algebraically closed 
field F, the set of matrices that are in Jordan canonical form is indeed 
a set of canonical forms for similarity (at least up to the order of the 
Jordan blocks). In other words, every matrix over F is similar to 
exactly one matrix that is in Jordan canonical form (again up to order 
of the Jordan blocks). 

Note that if r has Jordan canonical form 3, then the diagonal 
elements of J are precisely the roots of the characteristic polynomial 
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C^(x), including multiplicities. In other words, the number of times 
each diagonal element appears in § is the multiplicity of that element 
as a root of the characteristic polynomial. 



Geometric and Algebraic Multiplicities 

If A G F is an eigenvalue of a linear operator r, then the 
multiplicity of A, as a root of the characteristic polynomial C^(x), is 
called the algebraic multiplicity of A. On the other hand, the 
dimension of the eigenspace 6;^ is called the geometric multiplicity 
of A. 



Theorem 8.9 The geometric multiplicity of an eigenvalue A of 
T E «t(V) is less than or equal to its algebraic multiplicity. 

Proof. Suppose that A is an eigenvalue of r. Thus, 

m,(x) = (x-Arp(x) 

where x~A does not divide p(x). Consider the rational canonical 
form for r. In the primary decomposition of V, we have 

v = Vp^0...eVp^ 

where we may assume that 

Vp^ = {vGV|(x-Arv = 0} 



The cyclic decomposition of this primary submodule is 

= (vi)e-"©(vk) 

with elementary divisors 

e* 

mm(Tj) = (x — A) ^ 



where Tj = r 




e = ei > e2 > • • • > ek 



According to Theorem 8.4, the algebraic multiplicity of A is 



Now, 



alg. mult. = Cj — dim{{\-^) = dimly ^ ) 

j=l j=l ^ 



V € => r(v) =: Av (r - A)(v) = 0 

^(r-Anv) = 0^veVp^ 

and so C that is, all eigenvectors associated to A lie in Vp^. 
In fact, we can say more. Recall that the set 
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Vj)) 

is a basis for {vj), for all j = and that 

(Tj - A)^J(vj) = 0 

If 

i=0 

is an eigenvector in Vp^, then (r — A)(u) = 0, that is, 




i=o i=o 

and so r^ = 0 for i = 0, . . . , ej — 2, which implies that 

« = \vj) 

Hence, the only eigenvectors in (vj) are the scalar multiples of the 
vector . 

(r-A)j'(v.) 

This shows that the geometric multiplicity of A is the number k of 
r-cyclic subspaces (vj) that form Vp^. Since the algebraic multiplicity 
is the sum of the dimensions of these r-cyclic subspaces, the theorem 
follows. I 

Diagonalizable Operators 

We are now in a position to give several different characterizations 
of diagonalizable linear operators, that is, operators that can be 
represented by a diagonal matrix. The first characterization amounts 
to little more than the definitions of the concepts involved. 

Theorem 8.10 An operator r £ i'(V) is diagonalizable if and only if 
there is a bctsis for V that consists entirely of eigenvectors of r, that 
is, if and only if 

dim{&^^ 0 • • • 0 6;^^) = dim{Y) 
where A^,...,Aj^ are the distinct eigenvalues of r. I 

The Jordan canonical form gives us another characterization of 
diagonalizable operators. For, suppose that r is diagonalizable, and 
that 
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0 ••• 0 

^ ^2 ’ . ^ 

: **. **. 0 

0 ... 0 Ak_ 

where the diagonal elements are not necessarily distinct. Suppose 
further that 

? . . • ? 

1 s 

are the distinct diagonal elements. Then the minimal polynomial of r, 
which is the same the minimal polynomial of is 

“rW = Il(^“'^i-) 

j ^ 

Thus, m^(x) is the product of distinct linear factors. 

Conversely, suppose that r G *^(V) has the property that m^(x) 
is the product of distinct linear factors, say 

= (x-Ai)---(x-AJ 

where the A^’s are distinct. Then r has a Jordan canonical form. 
Moreover, referring to Theorem 8.8, all of the elementary divisors have 
the form x — A|, and so 

min(r- •) x — A: 

In other words, all vectors in (v— ) are eigenvectors, and so the basis 
for V, constructed in Theorem 8.8, consists only of eigenvectors for V. 
Hence, r is diagonalizable. We have established the following result. 

Theorem 8.11 A linear operator r G -f'(V) on a finite dimensional 
vector space is diagonalizable if and only if its minimal polynomial is 
the product of distinct linear factors. I 



[r]^ = 



Projections 

In order to obtain another characterization of diagonalizable 
operators, we turn to a discussion of a special type of operator. 

Definition Let V = S 0 S^. The map p:V— defined by 

p{s + s^) = s 

where s G S and G S^, is a linear operator on V, called projection 
on S along S^. D 

Figure 8.1 illustrates the concept of a projection. 
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Figure 8.1 

The following theorem describes projection operators. 

Theorem 8.12 

1) Let p be projection on S along S^. Then 

im{p) = S, ker{p) = 

V = im{p) 0 ker{p) 

V G im{p) p{y) = V 

Note that the last condition says that a vector is in the image of 
p if and only if it is fixed by p, 

2) Conversely, if cr 6 i^(V) has the property that 

V = im{cr) 0 ker{(T) and (t | = id 

then a is projection on im(cr) along ker{a). I 

Projection operators (or projections, for short) play a major role in 
the spectral theory of linear operators, which we will discuss in Chapter 
10. Now we turn to some of the basic properties of these operators. 

Theorem 8.13 A linear operator p G i'(V) is a projection if and only if 
it is idempotent, that is, if and only if p^ = p. 

Proof. To see that the projection operator p on S along is 
idempotent, observe that, for any s G S and G S^, 

/?2(s + s'") = p{p{s + s'")) = p{s) = s = p{a + s^) 

and so p^ = p. Conversely, if p is idempotent, let 

S = {v G V I p{\) = v} 

be the set of all vectors that are fixed by p. Then S C im{p). Also, if 
V G then v = />(w) for some w G V, and so 

p{'f) = p^{^) = />(w) = V 
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Hence, im{p) C S, and so S = im(p). In other words, 

P I im(p) 

Now, if V E im{p) fi ker{p)^ we have 

p(v) = V and p(y) — 0 

and so v = 0. Hence, im{p) fl ker{p) = {0}. 

Finally, observe that for v E V, 

V = p{y) + (v - p(w)) E im{p) + ker{p) 

and so 

V = im(p) 0 ker{p) 

An appeal to Theorem 8.12 completes the proof. I 



The Algebra of Projections 

If p is a projection, then so is l — /?, where l is the identity 
operator on V, for we have 

(l — p)^ = — ip — pL -j- p^ = t — p 

It is not hard to see that ker{t — p) = im{p) and i7n{t — p) = ker{p). 
Hence, if p is projection on S along S^, then t — p is projection on 
along S. 

Definition Two projections p,cr g i'(V) are orthogonal, written p X cr, 
if pa = ap = 0, D 

Note that p A. a if and only if 

im(p) C ker{a) and i7n{a) C keri^p) 

The following example shows that it is not enough to have pa = 0 in 
the definition of orthogonality, since it is possible for pa — 0 and yet 
ap may not even be a projection. 

Example 8.1 Let V == F^, and let 

D = {(x,x) I X E F}, X = {(x,0) 1 X E F}, Y = {(0,y) | y E F} 

Thus, D is the diagonal, X is the x-axis and Y is the y-axis in F^. 
(The reader may wish to draw pictures in IR^.) Using the notation 
Pa b for projection on A along B, we have 

/’d,x/’d,y = Pd,Y / /’d.x = Pd,YPD,X 

From this we deduce that if p and a are projections, it may happen 
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that both products pa and ap are projections, but that they are not 
equal. 

We leave it to the reader to show that py^xPx.D ~ ® (which is a 
projection), but that PxdPyx ^ projection. Thus, it may also 

happen that pa is a projection (even the zero projection) but that ap 
is not a projection. D 



If p and a are projections, it does not necessarily follow that 
p a^ p — a or pa is a projection. The sum p + a is a projection if 
and only if 

(p + cr)^ = p + (T 

or 

( 8 . 7 ) pa^-ap = {^ 

Multiplying this on the left by p and on the right by />, gives 



and 



pa + pap = 0 
pap -f crp = 0 



Hence, 

pa = ap 

which, together with (8.7), gives 2 pa = 0. Therefore, if char(F) ^ 2 
(so that 2^0) then = 0 and ap - 0. This shows that if p + a is 
a projection, then p J_ cr. Conversely, if pa — ap = 0, then 
(p -f (t)^ = p -I- (t, and so p-\-a is a projection. 

Now suppose that pa = ap = 0^ and so p + cr is a projection. 
To determine the kernel of p + note that 

(p + cr)(v) = 0 p(v) + a{\) = 0 =>p^(v) -|- pa{v) = 0 p(v) = 0 

and so ker{p + tr) C ker{p). In a similar way, ker{p -f cr) C ker{a), and 
so 

ker{p + (t) C ker{p) fl ker{a) 

But the reverse inclusion is obvious, and so 

ker{p + (t) = ker{p) fl ker{a) 

As to the image of p + we know that im(p + cr) is the set of 
vectors in V that are fixed by the projection p + a. Hence, 

V G im{p + a) y = {p + cr)(v) = p(v) + (t(v) G im{p) -f im(cr) 

and so im{p -f cr) C im{p) + iin{a). But notice that 

X G im(p) n im{a) => p(x) x = (t(x) x = p(x) = p^(x) = pcr(x) = 0 

which implies that nn(p) D im{a) = {0}. Therefore, 
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im{p + cr) C im{p) 0 im{a) 

To establish the reverse inequality, observe that if v G im{p) 0 
then V = r + s, for r G im{p) and s G im{(r)^ and so 

(/> + tT)(v) = (/? + cr)(r) 0 (p 0 ( t ){ s ) r 0 s = v 

This implies that v G im{p 0 tr), and so 

im{p 0 (t) = im{p) 0 im{a) 

Let us summarize. 

Theorem 8.14 Let p^cr G -L(V) be projections, where V is a vector 
space over a field of characteristic ^ 2. Then p + a is a projection if 
and only if p J- tr, in which case p 0 a* is projection on im{p) 0 ini(cr) 
along ker{p) fl ker{a). I 

Let us next consider the difference p — a. We know that p — cr 
is a projection if and only if 6 = l — {p — a) = {t — p) cr is a 
projection. Hence, we may apply the previous theorem to this case as 
well, to deduce that p — cr is a projection if and only if 

(a — p)a = — p) = 0 

or, equivalently. 



pa = (Tp = a 



Moreover, in this case, p — a = l — 9 is projection on ker{9) along 
im{9). Again using Theorem 8.14, since ^ = (a — p) 0 cr, we have 

im(6) = im{i — p) 0 im(a) = ker(p) 0 im(a) 

and 

ker{9) = ker{L — p) fl ker{a) = im{p) fl ker{a) 

We have proved the following. 



Theorem 8.15 Let p^a G ^(V) be projections, where V is a vector 
space over a field of characteristic ^ 2. Then p — cr is a projection if 
and only if 

pa — ap = a 

in which case p — cr is projection on im(p) fl ker{(T) along 
ker[p) 0 im{cr), | 



Finally, let us consider the product per of two projections. 

Theorem 8.16 Let p,(r G ^(V) be projections. If pa = ap then pa 
is a projection. In this case, pa is projection on nn(p) D im{a) along 
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ker(p) 4- ker{a). 

Proof. If p(T = a then 

(pa)^ = papa = p^a^ = pa 

and so pa is a projection. To find the image of pa^ we observe that if 
V = pcr(v), then 

p{v) = p{p(T{y)) = /kt(v) = V 
and so V G im{p). Similarly, v G im{a) and so 
im{pa) C nn(p) fl im(a) 

The reverse inclusion is clear, and so 

im{pa) = im{p) fl im{a) 

Next, we observe that if v G A:er(pa’), then p(r(v) = 0 and so 
<t(v) G ker{p). Hence, 

V = cr(v) + (t — cr(v)) G ker{p) + ker{a) 

Moreover, if v = r -f s G ker{p) + ker{a), then 

pa{w) = pa{r + s) = crp{r) + = 0 -f 0 = 0 

and so V G ker{pa). Thus, 

(8.8) ker{pa) = ker{p) + ker{a) I 

We should remark that if p = a^ then ker{p) = ker{a)^ and so the 
sum in (8.8) need not be direct. 

Resolutions of the Identity 

If /) is a projection, then 

p ± (t — />) and p-\-(^L — p) = L 
Let us generalize this to more than two projections. 

Definition If Pi,...,Pj^ are projections for which 

1) Pi-Lpj for 

2) />j H \- p^ = i 

then we refer to the sum in (2) as a resolution of the identity. D 

The next theorem displays a correspondence between direct sum 
decompositions of V and resolutions of the identity. 
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Theorem 8.17 

1) If H \- = i is a resolution of the identity, then 

V = im(pj) 0 • • • 0 irn{p^ 

2) Conversely, if V = S j 0 • • • 0 Sj^, and p^ is projection on S| 
along Sj 0 • • • 0 Sj 0 • • • 0 Sj^, where the hat " means that the 
corresponding term is missing from the direct sum. Then 



Pi H ^ P\i- ^ 

is a resolution of the identity. 

Proof. To prove (1) suppose that pi H p^ = i is a resolution of 

the identity. Then for any v G V, we have 



v = tv = ^i(v) + --- + /)k(v) 

which shows that V = i'fn(p^) H 1- im{p^). Now, since the projections 

are orthogonal, for each i, we have im(p^) C ker{p-^) for all j ^ i. 
Hence, 

MPj) C ker{p-) 

i 

which shows that 



n im{p-) = { 0 } 



and so 



V = 2m(pi) 0 • • • 0 im{p^) 

To prove part (2), observe that for i ^ j, 
im(pi) = Sj C ker{p-) 

and so P| J_ pj. Also, 

tv = Si + --- + Sk = ^l(v) + --- + p^(v) = Y,p^{y) 



and so l = p^-\ p^ is a resolution of the identity. I 



Projections and Diagonalizability 

Now let us consider a linear combination 

T = Ajpj + • • • + 

where Pi H f- p^ = ^ is a resolution of the identity. Since any vector 

V G V has the form 

v = Si+... + Si^ 



where S| G im(p-^), we have 
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^(v) = E Vi('") = EVi 

i i 

Thus, the action of r has the particularly simple form 

(8.9) v = Si + --- + Sk r(v) = AiSi + --- + Aj^Sk 
Moreover, we have the following. 

Theorem 8.18 A linear operator r G ^(V) is diagonalizable if and only if 
it has the form 

(8.10) T = H h 

for some resolution of the identity p^-\ p^ = t. Moreover, if r 

has the form (8.10), where the Aj’s are distinct and the p^’s are 
nonzero, then the A-’s are the eigenvalues of r, and im(p|) is the 
eigenspace of r associated with A|. 

Proof. Suppose that r is diagonalizable, and that Aj,...,Aj^ are the 
distinct eigenvalues of r. Then we have 

V = 0 • • • 0 

According to Theorem 8.17, the projections P| on 8;^, along 

8x 0 --* 08 x 0 --* 08 x 

Ai A. Aj^ 

form a resolution of the identity. Moreover, if v G 8;^ , then 

r(v) = Ajv = (AjPi + • • • + Ak^k)('^) 

from which we deduce that r = Ajp^ H f- Aj^pj^. 

Conversely, suppose that r has the form (8.10). Then, using 
Theorem 8.14, we may assume that the A^’s are distinct. Also, 

V = nn(pj) 0 • • • 0 nw(pj^) 

In view of (8.9), we have, for any S| G 2^(Pj) 

r(si) = AjS; 

which shows that A| is an eigenvalue of r, and that im{p-^) C 

To see the reverse inclusion, let v = H h Sj^ G 6;^., Vhere 

S| G 2^(Pj)- Then ^ 

Ai(si + • • • + Sk) = Ajv = r(v) = AjSi + • • • + AfcSk 

which implies that (A^ — Aj)sj = 0 for all j, and since the Aj’s are 
distinct, we have — 0 for j ^ i- Thus, v = G 2 m(p|), and so 
im(p|) = 8;^ . This shows that 

V ” 8 X 0 • • • 0 8 X 
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and so r is diagonalizable. 

Finally, we observe that if A is an eigenvalue of r, with 
eigenvector v = Sj H f- Sj^. Then 

A(si + . . . + s^) = Av = r(v) = AjSi + • • • + A^s^ 

and so (A — A-)s| = 0 for all i. Therefore, if A ^ A| for all i, we have 
S| = 0 for all i, that is, v = 0, which is not possible. Hence, all 
eigenvalues are among the coefficients A-. I 

Definition The set of eigenvalues of a linear operator on a finite 
dimensional vector space V is called the spectrum of r. If r is 
diagonalizable, and 

(8.11) ^ + "' + \Pk 

where p^-\ \- p^ = t is a resolution of the identity, the A|’s are 

distinct, and the p-’s are nonzero, then (8.11) is called the spectral 
resolution of the operator r. D 



Projections and Invariance 

There is a connection between projections and the notions of 
invariant subspace and reducing pair for a linear operator r. 

Theorem 8.19 Let r G i'(V). 

1) If S is an invariant subspace under r, then prp = rp for all 
projections p on S. 

2) If S is a subspace of V, and if prp = rp for any projection p 
on S, then S is invariant under r. 

Proof. To prove part (1), let S be invariant under r, and let p be 
projection on S along T, whence V = S 0T. Now, let v = s-h t G V, 
where s G S and t G T. Then, since p fixes S, 

prp{y) = pT{s) = r(s) = Tp{w) 

and so prp = rp. As to part (2), suppose that prp = rp, where p is 
projection on S along T. Let s G S. Then, since p fixes S, 

Pt{s) - pTp{s) = Tp{s) - r(s) 

and so p fixes r(s), which implies that r(s) G S. Thus, S is 
invariant under r. I 

Theorem 8.20 Let V = S ©T. Then a linear operator r G is 

reduced by the pair (S,T) if and only if rp = pr, where p is 
projection on S along T. 
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Proof. Suppose first that rp = pr^ where p is projection on S 
along T. Since v G S if and only if p(v) = v, we have for s G S, 

pr{a) = Tp{a) = r(s) 

and so p fixes r(s), which implies that r(s) G S. Hence, S is 
invariant under r. Also, v G T if and only if p(v) = 0, and so, for 
tGT, 

pr(t) = pr{t) = 0 

implies that r(t) G T. Hence, T is invariant under r. 

Conversely, suppose that (S,T) reduces r. Then the projection 
operator p fixes vectors in S, and sends vectors in T to 0. Hence, 
for s G S and tGT, since r(s) G S, we have 

pT{a) = r(s) = Tp{a) 

and 

pT{i) = 0 = rp{i) 
which imply that pT = rp, I 



EXERCISES 

1. A linear operator r G i'(V) is said to be nonderogatory if its 
minimal polynomial is equal to its characteristic polynomial. 
Prove that r is nonderogatory if and only if V is a cyclic 
module. 

2. Show that the eigenspace of an eigenvalue A is a subspace of V. 

3. Prove directly that the eigenvalues of a matrix are invariants 
under similarity. 

4. Prove that the eigenvalues of a matrix do not form a complete set 
of invariants under similarity. 

5. Show that not all matrices (hence linear operators) have 
eigenvalues in the base field. 

6 . Show that the set C— in ( 8 . 6 ) is an ordered basis for (v—). 

7. Show that r G i'(V) is invertible if and only if 0 is not an 
eigenvalue of r. 

8. Let A be an n X n matrix over an algebraically closed field F, 
such as the complex field C. Thus, all of the roots of the 
characteristic polynomial lie in F. Prove that det(A) is the 
product of the eigenvalues of A. Formulate a statement in this 
regard about linear operators. 

9. Show that is A is an eigenvalue of r, then p(A) is an eigenvalue 
of p(r), for any polynomial p(x). Also, if A 5 ?^ 0, then A“^ is 
an eigenvalue for r”^. 

An operator r G «L(V) is nilpotent if = 0 for some n G N. 



10. 
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a) Show that if r is nilpotent, then 0 is an eigenvalue of r, 
and it is the only eigenvalue of r. 

b) Find an operator r that has 0, and only 0 as an eigenvalue, 
but is not nilpotent. 

11. Show that if r,(7 G £(V), then ra and ar have the same 
eigenvalues. 

12. Suppose that r, (t G i'(V). Show that if ra = ar^ then r and 
a have a common eigenvector. 

13. Let p be a projection. Show that ker{L — p) = im(p) and 
im{t — p) = ker{p). 

14. Complete the details of Example 8.1. 

15. Find projections p and a for which per is a nonzero projection, 
but ap is not a projection. 

16. (Halmos) 

a) Find a linear operator r that is not idempotent, but for which 

— r) = 0. 

b) Find a linear operator r that is not idempotent, but for which 
t(l — r)^ = 0 . 

c) Prove that if -r) = r(i - r)^ = 0, then r is idempotent. 

17. An involution is a linear operator 9 for which 6^ = l. li r is 
an idempotent, what can you say about 2r — Construct a one- 
to-one correspondence between the set of idempotents on V and 
the set of involutions. 

The Trace of a Matrix 

18. Let A be an n X n matrix over an algebraically closed field F, 
such as the complex field C. Thus, all of the roots of the 
characteristic polynomial lie in F. The trace of A, denoted 
<r(A), is the sum of the elements on the main diagonal of A. 
Verify the following statements. 

a) ^r(rA) = r /r(A), for r G F 

b) tr{A -h B) = tr{A) -f tr(B) 

c) ^AB) = tr{BA) 

d) the trace is an invariant under similarity 

e) the trace of A is the sum of the eigenvalues of A. 

Formulate a definition of the trace of a linear operator, show that 
it is well-defined, and relate this concept to the eigenvalues of the 
operator. 

19. Use the concept of the trace of a matrix, as defined in the previous 
exercise, to prove that there are no matrices A, B G ^^(C) for 
which AB — BA = I. 

20. Let T: Jl)j^(F)-^F be a function with the following properties. 
For all matrices A, B G Jtj^(F), and r G F, 

1) T(rA) = rT(A) 
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2) T(A + B) = T(A) + T(B) 

3) T(AB) = T(BA) 

Show that there exists an s G F for which T(A) = s tr{A), for all 
AGJb„(F). 

Simultaneous Diagonalizability 

A pair of linear operators cr,r G <i(V) are simultaneously 
diagonalizable if there is an ordered basis for V for which [r]cg 
and [(jjgj are both diagonal. 

21. The purpose of this exercise is to prove that two diagonalizable 

operators a and r are simultaneously diagonalizable if and only 

if they commute, that is, err = ra, 

a) To prove necessity, suppose that is a basis of eigenvectors 
for both cr and r. Show that ra and ar agree on the 
vectors in 

b) To prove sufficiency, suppose that ar = rcr. Show that the 
eigenspaces of r are invariant under a, 

c) If is the restriction of cr to the ith eigenspace of r, show 
that CT| is diagonalizable. Hint consider the minimal 
polynomials of cr and cr^. 

d) Use the results of part (b) and (c) to complete the proof. 




CHAPTER 9 

Real and Complex 
Inner Product Spaces 



Conienis: Introduction, Norm and Distance. Isometries, 

Orthogonality. Orthogonal and Orthonormal Sets, The Projection 
Theorem. The Gram-Schmidt Ortho gonalization Process. The Riesz 
Representation Theorem. Exercises. 

Introduction 

We now turn to a discussion of real or complex vector spaces that 
have an additional function defined on them, called an inner product^ as 
described in the upcoming definition. Thus, in this chapter, F will 
denote either the real or complex field. If r is a complex number, then 
f denotes the complex conjugate of r. 

Definition Let V be a vector space over F, where F = R or F = C. 
An inner product on V is a function (,):V x V— ^F with the following 
properties. 

1) (Positive definiteness) For all v G V, 

(v,v) > 0 and {v,v} = 0 if and only if v = 0 

2) For F = C: (Conjugate symmetry, or Hermitian synunetry) 

(u,v) = (v,u) 

For F = R: (Symmetry) 

(u,v) = (v,u) 

3) (Linearity in the first coordinate) For all u,v G V and r,s G F 

(ru -h sv,w) = r(u,w) -h s(v,w) 
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A real (or complex) vector space V, together with an inner product 
defined on V, is called a real (or complex) inner product space. D 

We will study bilinear forms (“inner products”) on vector spaces 
over fields other than R or C in Chapter 13. Note that property (1) 
implies that the quantity (v,v) is always re a/, even if V is a complex 
vector space. 

Combining properties (2) and (3), we get, in the complex case 

(w,ru + sv) — (ru + sv,w) = f{u,w) + s(v,w) = r(w,u) -f s(w,v) 

This is referred to as conjugate linearity. Thus, a complex inner 
product is linear in its first coordinate and conjugate linear in its second 
coordinate. This is often described by saying that the inner product is 
sesquilinear. {Sesqui means one and a half times.) 

In the real case (F = R), the inner product is linear in both 
coordinates — a property referred to as bilinearity. 

Example 9.1 

1) The vector space R’^ is an inner product space under the 

standard inner product, or dot product, defined by 

((ri, . . . , r„),(si, . . . , s„)) = + • • • + 

The inner product space R^ is often called n-dimensional 

Euclidean space. 

2) The vector space is an inner product space under the 

standard inner product, defined by 

((rj, . . . , r„),(sp . . . , sj) = Tfy + • • • + r„s„ 

This inner product space is often called n-dimensional unitary 
space. 

3) The vector space V(n,2) of all binary n-tuples is an inner product 
space, under the inner product 

((ri,...,rj,(si,...,sj) = (risi+--- + r„sj mod 2 

4) The vector space C[a,b] of all continuous complex-valued 

functions on the closed interval [a,b] is an inner product space 
under the inner product 

(f,g) = [ f(x)g(x) dx □ 

Example 9.2 One of the most important inner product spaces is the 
vector space of all complex sequences (s^^) with the property that 
E I Sn I ^ under the inner product 
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((s„).(tn)) = S ®n*n 
n=0 

Of course, for this inner product to make sense, the sum on the right 
must converge. To see this, observe that since (s^^), (t^^) G 

oo oo 

s = ^ I I 2 < oo and t = 1 1„ | ^ < oo 

n=0 n=0 

Now, 

0<(lSn|-|tJ)2= |sJ2-2|sJ |tj + |tj2 

and so 

2 I I < I Sji I ^ + I I ^ 

which gives 

I c>o oo oo 

£ ®n^| |s„t„| < 2 |s„|2^. ^ |t^|2^g^t<oo 

n=0 n=0 n=0 n=0 

We leave it to the reader to verify that is an inner product space. D 

The following simple result is quite useful and easy to prove. 

Lemma 9.1 If V is an inner product space, and (u,x) = (v,x) for all 
X G V, then u = v. I 

Note that a vector subspace S of an inner product space V is 
also an inner product space under the restriction of the inner product of 
V to S. 



Norm and Distance 

If V is an inner product space, then we can define the norm, or 
length, of each v G V by 

(9-1) l|v||=^/M 

A vector v G V is a unit vector if || v || 1. 

Here are the bcisic properties of the norm. 

Theorem 9.2 

1) II V II > 0 and II V II =0 if and only if v = 0. 

2) II rv II = |r I II V II , for all r G F, v G V 

3) (The Cauchy-Schwarz inequality) For all u,v G V, 

l(u,v)| < Hull IMI 
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with equality if and only if u = rv for some r G F. 

4) (The triangle inequality) For all u,v G V, 

ll« + v|| < ||u|| + ||v|| 

with equality if and only if u = rv for some r G F. 

5) For all u,v,xGV, 

||u-v|| < Ilu-x|| + ||x-v|| 



6 ) 

7 ) 



For all u,v G V, 



u - 



< U-V 



(The parallelogram law) For all u,v G V, 



|u + v||2+ ||u-v||2 = 2||u||2 + 2 



Proof. We prove only (3) and (4). To prove (3), we proceed as follows. 
If either u or v is zero, the result follows. Assume that u,v ^ 0. 
Then, for any real number r G R, 

0< ||u + rv||2 



= (u -h rv,u + rv) 

= + r(u,v) + r(v,u) + r^(v,v) 

= (u,u) + r(v,u) + r{v,u) + r^{v,v) 

< (u,u) + 2r I (v,u) I + r^(v,v) = f(r) 



This implies that the quadratic polynomial f(r) must have nonpositive 
discriminant, that is, 

4 I (v,u) I 2 - 4(v,v)(u,u) < 0 

from which the Cauchy-Schwarz inequality follows. Furthermore, if 
equality holds, then there exists an r G F for which f(r) = 0, that is, 
0 = II u + rv II and so u + rv = 0, which implies that u is a scalar 

multiple of v. (If u is a scalar multiple of v, then equality is easily 
seen to hold.) 

To prove the triangle inequality, we use the Cauchy-Schwarz 
inequality to get 

llu + v||^ = (u + v,u + v) 

= («,«) + (u,v) + (v,u) + (v,v) 

< |1u||2 + 2|(u,v)|+||v||2 

< ||u||2 + 2||u|| ||v|| + ||v|| 

= (IHI + I|v||)2 



2 
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from which the triangle inequality follows. The proof of the statement 
concerning equality is left to the reader. I 

Any vector space V, together with a function || || :V— that 
satisfies properties (1), (2) and (4) of Theorem 9.2, is called a normed 
linear space. (And the function || || is called a norm.) Thus, any 
inner product space is a normed linear space, under the norm given by 
(9.1). 

It is interesting to observe that the inner product on V can be 
recovered from the norm. 

Theorem 9.3 

1) If V is a real inner product space, then 

(u,v)=l(llu + v||2- ||u-vl|2) 

2) If V is a complex inner product space, then 

(U)V) = i(l|u + v||2- ||u_vl|2) + ii(llu + iv||2- ||u-iv||2) 



The formulas in Theorem 9.3 are known as the polarization 
identities. The norm can be used to define the distance between any 
two vectors in an inner product space. 

Deflnition Let V be an inner product space. We define the distance 
rf(u,v) between any two vectors u and v in V by 

(9.2) d(u,v) = ||u-vl| D 

Here are the basic properties of distance. 

Theorem 9.4 

1) (f(u,v) > 0 and rf(u,v) = 0 if and only if u = v 

2) (Symmetry) d{u^y) = <i(v,u) 

3) (Triangle inequality) 

rf(u,v) < rf(u,w) + rf(w,v) D 

Any nonempty set V, together with a function d:Y x V— that 
satisfies the properties of Theorem 9.4, is called a metric space, and the 
function d is called a metric on V. Thus, any inner product space is 
a metric space under the metric (9.2). 

Before continuing, we should make a few remarks about our goals 
in this and the next chapter. The presence of an inner product, and 
hence a metric, raises a host of topological issues, related to the notion 
of convergence. We say that a sequence (v^^) of vectors in an inner 
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product space converges to v 6 V if 



or, equivalently, if 



lim d(y^.y) = 0 

n— V n’ / 



lim II V - V II =0 

n-+oo '• n II 



Some of the more important concepts related to convergence are 
closedness and closures, completeness, and the continuity of linear 
operators and linear functionals. 

In the finite dimensional case, the situation is very 
straightforward — all subspaces are closed, all inner product spaces are 
complete, and all linear operators and functionals are continuous. 
However, in the infinite dimensional Ccise, things are not as simple. 

Our goals in this chapter and the next are to describe some of the 
basic properties of inner product spaces — both finite and infinite 
dimensional, and then to discuss certain special types of operators 
(normal, unitary and self-adjoint), in the finite dimensional case only. 
To achieve the latter goal as rapidly as possible, we will postpone a 
discussion of topological properties until Chapter 13. This means that 
we must describe some results for the finite dimensional case only in 
this chapter, deferring the infinite dimensional case to Chapter 13. 



Isometries 

An isomorphism of vector spaces preserves the vector space 
operations. The corresponding concept for inner product spaces is the 
following. 

Definition Let V and W be inner product spaces, and let 
r G L(V,W). 

1) r is an isometry if it preserves the inner product, that is, if 

(r(u),r(v)} = (u,v) 

for all u,v G V. 

2) A bijective isometry is called an isometric isomorphism. When 
r:V“^W is an isomorphism, we say that V and W are 
isometrically isomorphic. D 

It is not hard to show that an isometry is injective, and so it is an 
isometric isomorphism provided it is also surjective. Moreover, if 

dimiy) = rfnn(W) < oo, injectivity implies surjectivity, and so the 
concepts of isometry and isometric isomorphism are equivalent. On the 
other hand, the following example shows that this is not the case for 
infinite dimensional inner product spaces. 
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Example 9.3 Let be defined by 

r(XpX2,X3...) = (0,XpX2,...) 

(This is the right shift operator.) Then r is an isometry, but it is 
clearly not surjective. D 

Theorem 9.5 A linear transformation r G i(V,W) is an isometry if 
and only if it preserves the norm, that is, if and only if 

II ^(v) II = II V II 

for all V G V. 

Proof. Clearly, an isometry preserves the norm. The converse follows 
from Theorem 9.3. In the real case, if r preserves the norm, then 

(r(u),r(v)) = 1( 11 r(u) + r(v) H ^ - H r(u) - r(v) H 

= |(lk(« + v)||2- llr(u-v)ll2) 

= i(llu + vll2-llu-vll2) 

= (v,w) 

and so r is an isometry. The complex case is similar. I 

The next result points out one of the main differences between real 
and complex inner product spaces. 

Theorem 9.6 Let V be an inner product space, and let r G i^(V). 

1) If (r(v),w) = 0 for all v, w G V, then r = 0. 

2) If V is a complex inner product space, and (r(v),v) = 0 for all 
V G V, then r = 0. 

3) Part (2) does not hold in general for real inner product spaces. 

Proof. Part (1) follows directly from Lemma 9.1. As for part (2), let 
V = rx-f y, for x,y G V and r G F. Then 

0 = (r(rx-|-y),rx-hy) 

= I r I ^(r(x),x) + (r(y),y) + r(r(x),y) + r(r(y),x) 

= r(T(x),y) + f(r(y),x) 

Setting r = 1 gives 

(r(x),y) + (r(y),x) = 0 

and setting r = i gives 

(r(x),y} - (r(y),x) = 0 

These two equations imply that (r(x),y) = 0 for all x,y G V, and so 
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r = 0 by part (1). As for part (3), consider the real inner product 
space R^, and let r E *t(V) be defined by r(e^) = ^ and — 

— Cj. Thus, r is rotation by 90°, and (r(v),v) = 0 for all v, but 
r / 0. I 

Orthogonality 

The presence of an inner product allows us to define the concept of 
orthogonality, or perpendicularity. 

Definition Let V be an inner product space. 

1) Two vectors u,v G V are said to be orthogonal if (u,v) = 0. In 
this case, we write u ± v. 

2) If S and T are subsets of V, and s J_ t for all s G S and 
t G T, we say that S is orthogonal to T, and write S ± T. 

3) The orthogonal complement of a subset S C V is the set 

S'" = {v € V I V J. S} D 

The following result is easily proved. 

Theorem 9.7 Let V be an inner product space. 

1) For any subset S C V, the orthogonal complement S'** of S is a 
subspace of V. 

2) For any subspace S of V, S flS'*’ = {0}. I 

Orthogonal and Orthonormal Sets 

Definition A nonempty collection O = {uj i G K} of vectors in an 
inner product space is said to be an orthogonal set if Uj X Uj for all 
i ^ j G K. If, in addition, each vector Uj is a unit vector, the set O is 
an orthonormal set. Thus, a set is orthonormal if 

for all i j G K, where 5- j is the Kronecker delta function. D 

Note that given any nonzero vector v G V, we may obtain a unit 
vector u by simply multiplying v by the reciprocal of the norm of v 




Thus, it is a simple matter to construct an orthonormal set from an 
orthogonal set of nonzero vectors. 
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Theorem 9.8 Any orthogonal set of nonzero vectors in V is linearly 
independent. 

Proof. Let O = {uj | i G K} be an orthogonal set of nonzero vectors, 
and suppose that 

ri«i + "- + r„u,, = 0 

Then, for any k = l,...,n, 

0 = (r^Ui + • • • + r„u„,Uk> = 

and so rj^ = 0, for all k. Hence, O is linearly independent. I 

Definition A maximal orthonormal set in an inner product space V is 
called a Hilbert basis for V. D 

Zorn’s lemma can be used to show that any nontrivial inner 
product space has a Hilbert basis. We leave the details to the reader. 

Extreme care must be taken here not to confuse the concepts of a 
basis for a vector space and a Hilbert basis for an inner product space. 
To avoid confusion, a vector space basis, that is, a maximal linearly 
independent set of vectors, is referred to as a Hamel basis. The 
following example shows that, in general, the two concepts of basis are 
not the same. 

Example 9.4 Let V = and let M be the set of all vectors of the 
form 

ei = (0,.. .,0,1,0...) 

where e- has a 1 in the ith coordinate, and Os elsewhere. Clearly, M 
is an orthonormal set. Moreover, it is maximal. For if x= (x^J G 
has the property that x ± M, then 

Xj = (x,ei) = 0 

for all i, and so x = 0. Hence, no nonzero vector x ^ M is orthogonal 
to M. This shows that M is a Hilbert basis for the inner product 
space 

On the other hand, the vector space span of M is the subspace S 
of all sequences in that have finite support^ that is, have only a 
finite number of nonzero terms, and since span{M) = S ^ we see 
that M is not a Hamel basis for the vector space D 

We will show in Chapter 13 that all Hilbert bases for an inner 
product space have the same cardinality, and so we can define the 
Hilbert dimension of an inner product space to be that cardinality. 
Once again, to avoid confusion, the cardinality of any Hamel basis for 
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V is referred to as the. Hamel dimension of V. The Hamel dimension 
is, in general, not the same as the Hilbert dimension. However, as we 
will now show, they are equal when the Hamel dimension is finite. 



Definition Let V be an inner product space with finite Hamel 

dimension. A Hamel basis for V that is also an orthogonal set is 
called an orthogonal Hamel basis for V, and a Hamel basis for V that 
is also an orthonormal set is called an orthonormal Hamel basis for V. 
(These concepts are defined only for finite dimensional vector spaces.) D 



Theorem 9.9 Let V be an inner product space with finite Hamel 
dimension. Then any Hilbert basis is a Hamel basis. Hence, the 
Hilbert dimension of V is the same as the Hamel dimension. 



Proof. Let V have Hamel dimension n. Since orthonormal sets of 
vectors in V are linearly independent, their size cannot exceed n. In 
particular, a maximal orthonormal set has size at most n. 

If O = {uj,...,uj^} is a maximal orthogonal set with k < n, then 
there exists a vector v G V for which O U {v} is linearly independent. 
If 



w = v + riUi+... + ri^Uj^ 

then (w,u-) = 0 if and only if 



0 = {w,Ui} = (v,Ui) + ri(ui,Ui} 

or, equivalently 




Thus, by defining r^ in this way, we obtain a vector w that is 
orthogonal to all vectors in O. Hence, OU{w/||w||} is an 
orthonormal set that properly contains the maximal orthonormal 
set O. This contradiction implies that k = n, and so O is a Hamel 
basis. I 



It is also true that if an inner product space has finite Hilbert 
dimension, then this is also equal to its Hamel dimension. (This will 
follow from our upcoming discussion of Gram-Schmidt 
orthogonalization.) Therefore, the term finite dimensional can be 
applied unambiguously to an inner product space. We will use the term 
orthonormal basis to refer to an orthonormal Hamel basis. 

Orthonormal bases have a great advantage over arbitrary bases. 
To see this, suppose that = {vi,...,VjJ is a basis for V. Then each 
V G V hcLS the form 
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In general, however, determining the coordinates r^ requires solving a 
system of linear equations of size n x n. 

On the other hand, suppose that O = is an 

orthonormal basis for V. As before, any vector v G V has the form 

v = riUi + ... + r^u^ 

But now we have 

(v,Uj) = (r,Ui + • • • + rj^u„,Ui) = ri(u;,Ui) = t- 

Thus, in the case of an orthonormal basis, we have a very simple 
procedure for finding the coordinates of any vector v G V. 

Theorem 9.10 Let O = {uj,...,Uj^} be an orthonormal basis for V. 

1) For any v G V, 

V = (v,Ui)u, + • • • + (v,Uj,}Ujj 

The coordinates (v,u*) are called the Fourier coefficients of v 
with respect to O, and the expression for v on the right is called 
the Fourier expansion of v with respect to O. 

2) (Bessel’s identity) For any v G V, 

l|v||2= |(v,u,>|2 + ...+ |(v,u,)|2 

3) (Parseval’s identity) For v,w G V, 

(v,w) = (v,Ui)(w,Ui) + • • • + {v,uj{w,uj I 

Theorem 9.10 shows clearly that orthonormal bases are a pleasure 
to work with. The following result is included primarily to establish an 
analogy with the infinite dimensional case, which we will discuss in 
Chapter 13. 

Theorem 9.11 Let V be a finite dimensional inner product space. Let 
O zz {uj,. . .,Uj^} be an orthonormal set of vectors in V. For any 
V G V, the vector 

V = (v,U,)Ui+--- + (v,UkK 

is called the Fourier expansion of v with respect to 0. 

1) (Bessel’s inequality) For all v G V, 

l|v|| < llvll 

2) The set O is an (orthonormal) Hamel basis for V if and only if 
V = V, for all V G V. 

3) The set O is an (orthonormal) Hamel bcisis for V if and only if 
Bessel’s identity holds for all v G V, that is, if and only if 
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for all V G V. 

4) The set O is an (orthonormal) Hamel basis for V if and only if 
Parseval’s identity holds for all v,w € V, that is, if and only if 

(v,w) = (v,Ui)(w,Ui) + • • • + (v,Uk){w,Uj^) 

for all v,w G V. I 

The Projection Theorem 

We have seen that if S is a subspace of an inner product space 
V, then SnS'*’ = {0}. This raises the question of whether or not the 
orthogonal complement of a subspace S is a (vector space) 

complement of S, that is, whether or not V = S 0 

If S is a finite dimensional subspace of V, the answer is yes, but 
for infinite dimensional subspaces, S must have the topological 

property of being complete. Hence, in accordance with our goals in this 
chapter, we will postpone a discussion of the general case to Chapter 13, 
contenting ourselves here with an example to show that, in general, 

vises'". 

Example 9.5 As in Example 9.4, let V = and let S be the 
subspace spanned by the vectors 

ei = (0,.. .,0,1,0...) 

where has a 1 in the ith coordinate, and Os elsewhere. Thus, S is 
the subspace of all sequences in that have finite support, that is, 
have only a finite number of nonzero terms. 

Now, if X = (Xj^) G S"*", then X| = (x,e|) = 0 for all i, and so 
X = 0. Therefore, = {0}, and 

S0S-"=S^£2 g 

As the next theorem shows, in the finite dimensional case, 
orthogonal complements are also vector space complements. This 
theorem is often called the projection theorem, for reasons that will 
become apparent when we discuss projection operators. (We will 
discuss the projection theorem in the infinite dimensional case in 
Chapter 13.) 

Theorem 9.12 (The projection theorem) Let S be a finite 
dimensional subspace of an inner product space. Then V = S 0 
That is, for any v G V, there are unique vectors s G S and G S"^ 
for which 

, X 

V = s + s 
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Proof. Let O = be an orthonormal basis for S. For each 

V G V, consider the Fourier expansion 

V = (v,Ui)Ui + ••• + (¥, Uj^K 

with respect to O. We may write 

V = V -f (v — v) 

where v G S. Moreover, v — v G since 

(v - V,Uj) = (v.Uj) - (v,Uj) = 0 

Hence V = S + S'*". We have already observed that S n S'*’ = {0}, and 
so V = S©S-^. I 

According to the proof of the projection theorem, the component 
of V that lies in S is just the Fourier expansion of v with respect to 
any orthonormal basis O for S. This is pictured in Figure 9.1. 




Definition Let V be an inner product space, and let S^,...,Sj^ be 
subspaces of V. If 

1) v = 

2) Si 1 Sj for i ^ j 

then we say that V is the orthogonal direct sum of S^,...,Sj^, and 
write S = S| D 

Theorem 9.12 states that V = S®S'*’, for any finite dimensional 
subspace S of V. The following simple result is very useful. 

Theorem 9.13 Let V be an inner product space. The following are 
equivalent. 

1) V = SQT 
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2) V = S©T and T = S'" 

3) V = S©T and T C S'" 

Proof. Suppose (1) holds. Then V = S 0T and S ± T, which implies 
that T C But if w 6 then w = s -f t for s G S, t G T, and so 

0 = (s,w) = (s,s) + (s,t) = (s,s) 

showing that s = 0, which implies that w G T. Thus, C T and so 
S'*" = T. Hence, (2) holds. Of course, (2) implies (3). Finally, if (3) 
holds, then T C S'**, which implies that S X T, and so (1) holds. I 

Theorem 9.14 Let V be an inner product space. 

1) If dim{Y) < oo and S is a subspace of V, then 

dim(S^) = dim(V) — dim(S) 

2) If S is a finite dimensional subspace of V, then = S. 

3) If S is a subset of V and dim{span{S)) < oo, then S'*"*" = 
span{S). 

Proof. Since V = S 0 S'**, we have rfnn(V) = dim{S) + dim{S^)y which 
proves part (1). As for part (2), it is clear that S C S'*"*". On the other 
hand, if v G S'*"*', then by the projection theorem 

V = s -f- s' 

where s G S and s' G S'*". But v G S’*"*" implies that 0 = (v,s') = 
(s',s'), and so s' = 0, showing that v G S. Therefore, S'*"** C S, and 
gxx _ g leave the proof of part (3) as an exercise. | 

The Gram-Schmidt Orthogonalization Process 

Given a linearly independent sequence = (vj,V 2 ,. ..) in an inner 
product space V, we can easily construct an orthogonal sequence O = 
(U|,U 2 ,...) in V, with the property that 

span{uj,...,uj^} = 5pan{vj,...,vj^} 

for all k. The following construction is known as the Gram-Schmidt 
orthogonalization process. 

The first step is to let = Vj. Next, we search for a vector U 2 
of the form U 2 = V 2 + r^u^ for which {u 2 ,Uj) = 0, that is, for which 

0 = («2>«i) = {''2 + = (V2>«i) + 

or, equivalently, 

_ _(v^ 

^ (UpUl) 
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Hence, defining by this formula, we see that the set {u^,U 2 ) is 
orthogonal, and that span{n^^U 2 } = span{y^,\ 2 }- 

More generally, suppose that is orthogonal, and 

that 5 ])an{uj,...,uj^_j} = We want a vector of 

the form 



for which 0 for all i = l,...,k-l, that is, 

0 = (ui,,Ui} = (vj^ + + • • • + = (vj^,Ui) + ri(u;,Ui) 

or, equivalently, 

, _ (Vk'Uj) 



for all i = l,...,j. Defining the Fj’s by this formula gives us the 
desired vector Uj^. Let us summarize. 



Theorem 9.15 (The Gram-Schmidt orthogonalization process) Suppose 
that '36 = (v,,V 2 ,...) is a sequence of linearly independent vectors in an 
inner product space V. If we define 






i=l 






then the sequence 0 = (ui,U 2 ,...) is an orthogonal sequence of linearly 
independent vectors, with the property that 



5pan{up . . . , Uj^} = span{y^, • • • i Vj^} 



for all k = 1,2,.... I 



Example 9.6 Consider the inner product space F[x] of all polynomials 
over F, with inner product defined by 



r 1 



Applying 

gives 



(p(x),q(x)} 



p(x)q(x)dx 



•' -1 

the Gram-Schmidt process to the sequence 



^ = (l,x,x^,x^,...) 



Uj(x) = 1 

X dx 

U2(x) = X -y— 1 = X 

J 



U 3 W 



J ^ x^ dx 

-1 



• 1 - 



j!./dx 

jJ_lX dx 



•X = X 



1 

3 
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U4(x) = 



j\x^dx j;_/dx 
— i 1 r X 



J^^x3(x24) dx 




and so on. The polynomials in this sequence are (at least up to 
multiplicative constants) the Legendre polynomials. D 



The Riesz Representation Theorem 

If X is a vector in an inner product space V, then the function 
(^^:V^F defined by 

= (v,x) 

is easily seen to be a linear functional on V. The following theorem 
shows that all linear functionals on a finite dimensional inner product 
space V have this form. (We will see in Chapter 13 that, in the infinite 
dimensional case, all continuous linear functionals on V have this form.) 

Theorem 9.16 (The Riesz representation theorem) Let V be a finite 
dimensional inner product space, and let f G V* be a linear functional 
on V. Then there exists a unique vector x G V for which 

(9.3) f(v) = (v,x) 

for all V G V. 

Proof. If f is the zero functional, we may take x = 0, so let us assume 
that f 0. By way of motivation, observe that if x has the desired 
property, then (v,x) = 0 for all v G ker{{). Hence, we should look for 
an X in ker{f)^. 

Note that, if dim(V) = n, then dim{ker({)) = n — 1. Hence, we 
can choose a unit vector u G ker({)^, and write 

V = (u) Q ker({) 

Our goal is to find an r G F for which 

f(v) = (v,ru) 

for all V G V. In particular, for v = u, we want 
f(u) = (u,ru) = f(u,u) = r 
Therefore, let us take r = f(u), and so 

X=:f(^U 

Any vector v G V has the form v = au -f bw, with w G ^er(f), and so 
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(v,x) = (v,f(u)u) = f(u)(v,u) = f(u)a = f(au) = f(au + bw) = f(v) 
Proof of uniqueness is left as an exercise. I 

Using the Riesz representation theorem, we can define a map 
by <^(f) = x, where x is the unique vector in V for which 
(9.3) holds, that is, (j){f) is defined by 

f(v) = {v,<j){f)) 

for all V G V. Since 

+ sg)) = (rf + sg)(v) 

= rf(v) + sg(v) 

= 

= (v,f«!i(f) +S(i&(g)> 

we have 

<^(rf + sg) = T(j>{f) + mg) 

and so (f) is conjugate linear. In addition, 0 is clearly surjective, and 
it is injective, since (j){f) = 0 implies that f = 0. Thus, the map 
(j):V*^Y is a ‘‘conjugate isomorphism.” 

EXERCISES 

1. Verify the statement concerning equality in the triangle inequality. 

2. Prove the parallelogram law. 

3. Prove Appolonius’ identity 

II w - u II 2 + |.| w - V II 2 = 1 II U - V II '^ + 2 II w - i(u + v) II 2 

4. Let V be an inner product space with basis Show that the 
inner product is uniquely defined by the values (u,v), for all 
u,v G 

5. Prove that two vectors u and v in a real inner product space V 
are orthogonal if and only if 

||u + v||2= ||u||H ||v||2 

6. Show that an isometry is injective. 

7. Use Zorn’s lemma to show that any nontrivial inner product space 
has a Hilbert basis. 

8. Prove Bessel’s inequality. 

9. Prove that an orthonormal set O is a basis for V if and only if 
V = V, for all V G V. 

Prove that an orthonormal set 0 is a basis for V if and only if 
Bessel’s identity holds for all v G V, that is, if and only if 



10. 
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l|v|| = IMI 

for all V G V. 

11. Prove that an orthonormal set O is a basis for V if and only if 
Parseval’s identity holds for all v,w G V, that is, if and only if 

{v,w) = {v,Ui){w,Ui) + • • • + (v,U^)(w,Uk) 

for all v,w G V. 

12. Let V be an inner product space. Prove that S C S for any 
subspace S C V. 

13. Let V be a finite dimensional inner product space. Prove that 
for any subset S of V, we have = span{S). 

14. Let ^3 be the inner product of all polynomials of degree at 
most 3, under the inner product 

(p(x),q(x))= [ p(x)q(x)e"’'^dx 

Apply the Gram-Schmidt process to the basis {l,x,x^,x^}, thereby 
computing the first four Hermite polynomials (at least up to 
multiplicative constant). 

15. Verify uniqueness in the Riesz representation theorem. 




CHAPTER 10 

The Spectral Theorem 
for Normal Operators 
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Operators, Normal Operators, Orthogonal Diagonalization, 
Orthogonal Projections, Orthogonal Resolutions of the Identity, The 
Spectral Theorem, Functional Calculus, Positive Operators, The 
Polar Decomposition of an Operator, Exercises, 



The Adjoint of a Linear Operator 

The purpose of this chapter is to study the structure of certain 
special types of linear operators on an inner product space. In order to 
define these operators, we introduce another type of adjoint (different 
from the operator adjoint of Chapter 3). We will define this adjoint in 
the finite dimensional case only, deferring the infinite dimensional Cctse 
to Chapter 13. 

Theorem 10.1 Let V and W be finite dimensional inner product 
spaces over F, and let r G £(V,W). Then there is a unique function 
r*:W-^V, defined by the condition 

(r(v),w) = (v,r*(w)} 

for all V G V and w G W. This function is in £(W,V), and is called 
the adjoint of r. 

Proof. For a fixed w G W, consider the function defined by 
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It is easy to verify that 9^ is a linear functional on V, and so, by the 
Riesz representation theorem, there exists a unique vector x G V for 
which 

^w(v) = (t(v),w) = {v,x) 
for all V 6 V. Hence, if we set r*(w) = x, then 

(r(v),w) = (v,r*(w)) 

for all V G V. This establishes the existence and uniqneness of r*. To 
show that T* is linear, observe that 

(v,r*(rw + sw*)) = (r(v),rw + sw') 

= r(r(v),w)+s(r(v),w') 

= f(v,r*(w)) + s(v,r*(w')) 

= (v,rr*(v)) + (v,sr*(w')) 

= (v,rr*(w) + sr*(w')) 

for all V G V, and so 

r*(rw + sw') = rr*(w) + sr*(w') 

Hence r* G £(V,W). I 



We should make some remarks about the differences between the 
operator adjoint of r, as defined in Chapter 3, and the adjoint 

r* that we have just defined, which is sometimes called the Hilbert 
space adjoint. In the first place, if r:V— >W, then 



but 



r ^ : W*->V* 
r*:W-^V 



These maps are shown in Figure 10.1, where (j)^ and <^2 
conjugate linear maps that we discussed in Chapter 9, following our 
discussion of the Riesz representation theorem. 



v< 



w 






02 



V 




w 



Figure 10.1 
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We can define a function cr:W*— »V* by 

( 10 . 1 ) (7 = 

Because a involves two conjugate linear maps (and one linear map), 
it is linear. Moreover, for all f G W* and v G V 

[<^(f)](v) = [(^i)-V*<^2(f)](v) = (0i)-'[r*.^2(f)](v) 

= (v,r*<?i2(f)) = (r(v),<^2(f)) = f(^(v)) = (f)(v) 

and so <r = r ^ . Hence, the relationship between r ^ and t* is given 
by 

= ((^j) V*«i 2 

In Chapter 3, we showed that the matrix of the operator adjoint 
is the transpose of the matrix of the map r. For Hilbert space 
adjoints, the situation is slightly different. Suppose that S = 
is an ordered orthonormal basis for V, and C = (cj,...,Cj^) 
is an ordered orihonormal basis for W. If we let 

Mgg (o = (ajj) 

then a— is the coordinate of C| in r(bj), that is 

j = (’■(bj).‘^i) 

On the other hand, if 

= (“iu) 

then q;— is the coordinate of bj in that is 

If A = (a- •) is a matrix over F, then the conjugate transpose of A is 
the matrix 

A* = (aij)" 

With this terminology, we have proved the following. 

Theorem 10.2 Let r G L(V,W), where V and W are finite 
dimensional inner product spaces. Let and C be ordered 

orthonormal bases for V and W, respectively. Then 

In words, the matrix of the adjoint r* is the conjugate transpose of 
the matrix of r. I 
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Here are some of the basic properties of the adjoint. 

Theorem 10.3 Let <r,r G L(V,W), where V and W are finite 
dimensional. 

1) (r*(v),u) = (v,r(u)) 

2) (<r + r)* = <T* + r* 

3) (rr)* = TT* 

4) r** = r* 

5) if V = W 

6) If r is invertible, then 

Proof. We prove (3) only: 

(v,(rr)*u) = {rr(v),u) = r(r(v),u} = r(v,r*(u)) = (v,rr*(u)) 
and so (rr)* = it*. I 

Orthogonal Diagonalizability 

Recall that a linear operator r G •f'CV) on a finite dimensional 
vector space V is diagonalizable if and only if V has a basis 

consisting entirely of eigenvectors of r, or equivalently, if and only if r 
has a spectral resolution 

r = Ai/>i + .-- + Ak/>k 

where p^-\ \- t is a resolution of the identity, the Aj’s are the 

distinct eigenvalues of r, and is projection onto the eigenspace 6;^ . 
Since in this case ^ 

the action of r can be described in the simple form 

v = Vi + --- + Vj^ r(v) = AiVj+--- + A^v^ 

where Vj G 6;^ for all i. 

While this description of r is simple, it does require finding the 
components of v that belong to each eigenspace 6;^ which, in general, 
requires solving a system of equations. * 

However, suppose that V is a finite dimensional inner product 
space, and that O is an ordered orihonormal basis consisting entirely 
of eigenvalues of r. If 

is the subset of O consisting of the eigenvalues associated to A|, then 
it is not hard to see that 0| is an ordered orthonormal basis for 6;^ , 
and the component of v in 6;^ is the easily computed Fourier 
expansion ^ 



Vi = + • • . + (v,U; ^)Uj 
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of V with respect to 0*. Hence, the action of r has the truly simple 
form 

v = Vi+--- + Vj^ t(v) = AjVi+--- + A^Vj^ 

Definition Let V be a finite dimensional inner product space, and let 
T G L(V). If there is an orthonormal basis O for V for which [t]q 
is a diagonal matrix, we say that r is orthogonally diagonalizable. D 

It is clear from this discussion that r is orthogonally 
diagonalizable if and only if there is an orthonormal basis for V 
consisting entirely of eigenvectors of r, that is, if and only if 

Thus, orthogonally diagonalizable operators are very well behaved 
indeed, and this leads us to seek a simple criterion for determining 
whether or not a given operator is orthogonally diagonalizable. 
Remarkably, there is a simple criterion. 



Motivation 

By way of motivation, suppose that V is a finite dimensional 
inner product space over F, and that all of the roots of the 
characteristic polynomial of r G ^(V) lie in F, that is, that the 
minimal polynomial of r splits into a product of linear factors over F, 

= (x-Aj) i---(x-Ak) ^ 

where the A|’s are the distinct eigenvalues of r. (This happens for all 
operators on a complex inner product space, for instance.) Then, 
according to the primary decomposition theorem, we may write V as 
the direct sum 

v = Vie--*0Vk 

where 

Vj = {v € V |(r - Aj)''i(v) = 0} 

If V is an eigenvector of r associated with A-, then 
(r - A|)(v) = 0, and so v G V^. In other words, 6;^, C V-. Thus, r 
will be orthogonally diagonalizable if and only if ^ 

1) zz Vj, for all i, and 

2) for i^j. 

Let us consider property (2) first. This property is equivalent to 
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r(v) = A-v and r(w) = AjW {v,w} = 0 
for Now, let us observe that 

= (r(v),w) = (v,r*(w)) 

and further, if it were true that r*(w) = AjW, then we could continue 

= (v,Ajw) = Aj{v,w) 

which implies that (v,w) = 0 (since A| ^ Aj). Thus, if r has the 
property that _ 

r(w) = AjW => = AjW 

for all j, then property (2) will hold. This is equivalent to 

(r-Aj)(w) = 0 => (r*-Aj)(w) = 0 

or, since A^* = Aj (that is (Ajt)* == where t is the identity 
operator), 

(r-Aj)(w) = 0 (r-Aj)*(w) = 0 

If we set (7 = r — Aj, then this is equivalent to 
(t(w) = 0 ^*(w) == 0 

which in turn is equivalent to 

((t(w),<t(w)) = 0 =i- ((T*(w),o-*(w)) = 0 
This will hold if (T*cr = for in this case, 

((T(w),<r(w)) = ((T*cr(w),w) - ((7<r*(w),w) = {(T*(w),(7*(w)) 

But (T*cr = crcr* if and only if r*r = rr*, and so we conclude that 
r*r = rr* => property (2) holds 

As we will see, if r*r = rr*, then property (1) holds as well! In any 
case, we have motivated one of the following definitions. 

Definition Let V be an inner product space, and let r E i'(V). Then 

1) r is self-adjoint, or Hermitian, if r* = r. 

^ ^ 1 

2) r is unitary if it is bijective and r = r . 

3) r is normal if rr* = r*r. D 

It is clear that self-adjoint and unitary operators are normal. 

There are also matrix versions of these definitions, but the 
terminology differs for real and complex matrices. Recall that if A = 
(a|j) is a matrix over F, then A* = (^j j)^ is the conjugate transpose 
of A. (If F = R, then A* = A*".) 
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Definition Let A be a complex matrix. Then 

1) A is Hermitian if A* = A. 

2) A is skew-Hermitian, if A* = -A. 

3) A is unitary if it is invertible and A* = A“^. 

4) A is normal if AA* = A*A. 

Let A be a real matrix. Then A* = A^, and we say that 

5) A is symmetric if A^ = A. 

6) A is skew-symmetric if A = -A. 

7) A is orthogonal if A is invertible and A^ = A”^. D 

In the finite dimensional case, we have seen that 

[-1o = Wo 

for any ordered orthonormal basis O of V, and so if r is normal, 
then 

HoMo = Ho[^1o= 

= [r*r](5 = [rIoHo = MoMq 

which implies that the matrix [t]q of r is normal. The converse 
holds as well. In fact, we can say that r is normal {resp, self-adjoint, 
unitary) if and only if any matrix that represents r, with respect to an 
ordered orihonormal basis O, is normal {resp, Hermitian, unitary). 

Let us now turn to a discussion of the three types of operators 
that we have just defined. 



Self-Adjoint Operators 

By definition, an operator r is self-adjoint if and only if 
(r(v),w) = (v,r(w)) 

for all v,w G V. Here are some of the basic properties of these 
extremely important operators. 

Theorem 10.4 Let V be an inner product space, and let <r,r G i'(V). 

1) If a and r are self-adjoint, so is a + r. 

2) If r is self-adjoint and r is real^ then rr is self-adjoint. 

3) If O’ and r are self-adjoint, then or is self-adjoint if and only 

if ar = TO. 

4) If T is self-adjoint and invertible, then so is 

Proof. We prove only (3). To this end, observe that (<tt)* = 
and so 

(err)* = ar r*cr* = ar ^ to = or I 
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Theorem 10.5 Let V be an inner product space. 

1) If r is self-adjoint, then (r(v),v) is real, for all v G V. 

2) If V is complex and {r(v),v) is real for all v € V, then r is 
self-adjoint. 

3) If r is self-adjoint and (r(v),v) = 0 for all v € V, then r = 0 
(c/. Theorem 9.6). 

4) If r is self-adjoint then t\y) = 0 for any k > 0 implies that 
r(v) = 0. 

5) If T is self-adjoint, then all complex roots of the characteristic 
polynomial (and hence minimal polynomial) of r are real. 

6) If A, are distinct eigenvalues of a self-adjoint operator r, then 

Proof. 

1) For part (1), we have 

(r(v),v) = (v,r(v)) = (r(v),v) 

and so (r(v),v) must be real. 

2) To prove part (2), we have 

((r - r*)(v),v) = (r(v),v) - (r*(v),v) 

= (t-(v),v) - (v,r(v)) 

= (^(v),v) - (t-(v),v) 

= 0 (since (t(v),v) is real) 

Hence, according to Theorem 9.6, r — r* = 0, which shows that 
r is self-adjoint. 

3) As for part (3), Theorem 9.6 implies that this is true for the 
complex case, so we need only consider the real case, for which we 
have 

0 = (r(x + y),x + y} 

= (’■(x).x) + (^(y)>y) + (^W.y) + (^(y)-*> 

= (’■(x),y) + (i-(y),x) 

= {r(x),y) + (x,r(y)) 

= (i'(x),y) + (7-(x),y) 

= 2(r(x),y) 

and so r = 0. 

4) If r\y) = 0 for all v G V, then (v) = 0 for some m. Thus, 



0 = (r^ (v),v) = (r2 (v),v) = (r2 (v)) 
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<^m— 1 

and so r =0. Repeating this process, we eventually get 
r = 0. 

5) Suppose first that V is a complex vector space, and that A is a 
root of C^(x). Then r(v) = Av, for some v / 0 and we have 

(r(v),v) = (Av,v) = A{v,v) 
and _ 

(r(v),v) = (v,r(v)) = (v,Av) = A(v,v) 

and so A = A, which shows that A is real. 

If V is a real vector space, we must be careful, since if A 
is a complex root of C^(x), it does not follow that r(v) = Av for 
some 0 V G V. However, we can proceed as follows. Let r be 
represented by the matrix A, with respect to some ordered basis 
for V. Then C^(x) = C^(x). Now, A is a real symmetric 
matrix, but can be thought of as a complex Hermitian matrix, 
that happens to have real entries. As such, it represents a self- 
adjoint linear operator on the complex space and so, by what 
we have just shown, all (complex) roots of its characteristic 
polynomial are real. But the characteristic polynomial of A is 
the same, whether we think of A as a real or a complex matrix, 
and so the result follows. 

6) Suppose that r(v) = Av and r(w) = where v,w ^ 0. Then 

A(v,w) = (r(v),w) = (v,r(w)) = (v,/iw) = /i{v,w) 
and so A / ^ implies that (v,w) = 0. I 

Of course, the fact that all complex eigenvalues of a self-adjoint 
operator are real implies that the minimal polynomial of r factors into 
a product of linear factors. 



Unitary Operators 

We now turn to the basic properties of unitary operators. Note 
that r is unitary if and only if 

(r(v),w) = (v,r-\w)) 

for all v,w G V. 

Theorem 10.6 Let V be an inner product space, and let (t,t G L(V). 

1) If r is unitary, so is r~^. 

2) If (T,r are unitary, so is err. 

3) r is unitary if and only it is a surjective isometry 

4) If dim(V) < oo, then r is unitary if and only if r takes an 
orthonormal basis to an orthonormal basis. 
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5) If T is unitary, then the eigenvalues of r have absolute value 1. 

Proof. We leave the proofs of (1) and (2) to the reader. 

3) For a bijective linear map r, we have 

r is an isometry ^ (r(v),r(w)) = (v,w) for all v,w G V 
(v,r*r(w)) = (v,w) for all v,w G V 
^ r*r(w) = w for all w G V 
r*r = i 
r* = 

r is unitary 

4) Suppose that r is unitary, and that O = {u^,...,Uj^} is an 
orthonormal basis for V. Then 

(r(ui),r(uj)) = (upUj) = 

and so r(0) is an orthonormal bcisis for V. Conversely, suppose 
that O and t( 0) are orthonormal bases for V. Then 

(r(ui),r(uj)> = = (uj,Uj) 

and so, if v = Sr|U| and w = ^SjUj, we have 
(r(v),r(w)) = 

i j 

= EriSj(r(u;),r(uj)) 

ij 

= EriSj(ui.«j} 
ij 

= (Eri«i,EsjUj} 

1 j 

= {v,w) 

and so r is unitary. 

5) If r is unitary, and r(v) = Av, then 

AA(v,v) = (Av,Av) = (r(v),r(v)) = (v,v) 
and so | A [ ^ = AA = 1, which implies that | A | = 1. I 

We also have the following theorem concerning unitary (and 
orthogonal) matrices. 
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Theorem 10.7 Let A be a matrix. 

1) An n X n matrix A is unitary if and only if the columns of A 
form an orthonormal set in C^. 

2) An n X n matrix A is unitary if and only if the rows of A form 
an orthonormal set in C^. 

3) If A is unitary, then | det(A) | =1. In particular, if A is 
orthogonal, then det(A) — ± 1. 

Proof. The matrix A is unitary if and only if AA* = I, which is 
equivalent to saying that the rows of A are orthonormal. Similarly, 
A is unitary if and only if A*A = I, which is equivalent to saying that 
the columns of A are orthonormal. As for part (3), we have 

AA* = l => det(A)det(A*) = 1 det(A)d^t(A) 1 
from which the result follows. I 



Normal Operators 

Now let us discuss the properties of normal operators, including 
the key properties that we used to motivate the definition of normal 
operators. 

Theorem 10.8 Let V be an inner product space, and let r be a 
normal operator on V. 

1) For any polynomial p(x) E F[x], the operator p(r) is also 
normal. 

2) r(v) = 0 = 0 

3) = 0 for any k > 0 r(v) = 0 

4) For any A E F, (r — A)^(v) = 0 => (r -- A)(v) = 0 

5) If r(v) = Av, then r*(v) = A(v). 

6) If A, /i are distinct eigenvalues of r, then _L 

Proof. We leave the proofs of parts (1) and (2) as exercises. 

3) The operator a = rr* is easily seen to be self adjoint, and since 
r is normal, we have 

= (r*)*'(r)’^(v) = 0 

and so, according to Theorem 10.5, cr(v) = 0, that is, rr*(v) = 
0. But then 

0 = {rr*(v),v} = (r(v),r(v)) 

and so r(v) = 0. 

4) Part (4) follows from parts (1) and (3). 

5) Suppose that r(v) = Av, where v 0. Then (r — A)(v) = 0. 

Hence, according to part (2), (r~A)*(v) = 0. But (r — A)* = 
r* — A, from which the result follows. 
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6) Suppose that r(v) = Av and r(w) = /iw, where v,w ^ 0. Then 
A(v,w) = (t(v),w) = (v,r*(w)) = (v./Iw) = /i{v,w) 
and so A ^ /i implies that (v,w) = 0. ■ 

Orthogonal Diagonalization 

We are now in a position to state one of the most beautiful 
theorems in linear algebra. 

Theorem 10.9 Let V be a finite dimensional complex inner product 
space. 

1) A linear operator r on V is orthogonally diagonalizable if and 
only if it is normal. 

2) Among all normal operators on V, we can characterize self- 
adjoint and unitary ones by their eigenvalues. To wit: 

a) A normal operator is self-adjoint if and only if all of its 
eigenvalues are real. 

b) A normal operator is unitary if and only if all of its 
eigenvalues have absolute value 1. 

Proof. To prove part (1), let r be a normal operator on a complex 
inner product space. If the prime factorization of the minimal 
polynomial of r is 

mr(x) = (x - XiP- ■ -(x - 
then the primary decomposition theorem gives 

v = Vi0.--eVj, 

where, according to part (4) of Theorem 10.8, 

Vj = {v e V I (r - Aj)®i(v) = 0} = {v € V I (r - Ai)(v) = 0} = 6;^. 

Hence, the minimal polynomial of r | is x - A^, and so e- = 1 for 
all i. Thus * 

V = 8x 0---e8x 

Moreover, part (6) of Theorem 10.8 shows that V is the orthogonal 
direct sum 

V = 6x Q-*-Qex 

Hence, we may construct an orthonormal basis of eigenvectors of r by 
combining orthonormal bases for each eigenspace, and so r is 
orthogonally diagonalizable. 

For the converse, if r is orthogonally diagonalizable, then there 
is an orthonormal basis O = {u^,. . .,Uj^} for V consisting of 
eigenvectors of r, say r(u-) = A^Up Then 
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(Uj,r*(Uj)) = (r(Ui),Uj)) = Ai(Uj,Uj) = X-6-. = = (upAjUj) 

and so = AjUj. Thus, 

rr*(uj) = Ajr(uj) = AjAjUj = X-X-n- = Ajr*(uj) = r*r(uj) 
and so r is normal. 

As for part (2a), we have already seen that a self-adjoint operator 
is normal, and has real eigenvalues. On the other hand, if r is normal 
and has real eigenvalues, then for any eigenvector Uj, associated to Aj, 

r*(uj) = AjUj = AjUj = r(uj) 

and since there is a basis of eigenvectors, r is self-adjoint. The proof 
of part (2b) is similar. I 

Thus, on a finite dimensional complex inner product space, 
diagonal matrices form a set of canonical forms for the class of normal 
operators (at least up to order of the diagonal entries). For real inner 
product spaces, the situation is a bit different. 

Theorem 10.10 A linear operator r on a finite dimensional real inner 
product space is orthogonally diagonalizable if and only if it is self- 
adjoint. 

Proof. Suppose that V is a real inner product space. If r is self- 
adjoint, then according to part (5) of Theorem 10.5, the minimal 
polynomial of r splits over R. Moreover, parts (4) and (6) of 
Theorem 10.5 (with part (4) applied to the symmetric operator r — A), 
show that V has an orthonormal basis of eigenvectors for r. Hence, 
T is orthogonally diagonalizable. (This is similar to the proof of 
Theorem 10.9.) 

Here is a matrix proof of the converse. If r is orthogonally 
diagonalizable, then there is an orthonormal basis O for V for which 
[t]q is diagonal, and since [t]q is real, it is symmetric. Hence, 

= Mo = Mo = Mo 

and so t* = r. I 

The matrix versions of Theorems 10.9 and 10.10 are as follows. 

Theorem 10.11 

1) Let A be a square complex matrix. Then there exists a unitary 

matrix U for which UAU”^ is diagonal if and only if A is 

normal. 

2) Let A be a square real matrix. Then there exists an orthogonal 

matrix O for which OAO“^ is diagonal if and only if A is 

symmetric. I 
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We must work a little harder to find a canonical form for unitary 
operators over the real field. The problem is that the minimal 
polynomial m^(x) of a real unitary operator r may not split over R. 

However, we can proceed as follows. If r is real unitary, then 
tr = r-hT* = r-f is self-adjoint, and has a complete set of real 
eigenvalues, so we may decompose V as in the proof of Theorem 10.9, 

V = 8 x ©---Qex 

where ' 

8 Aj = {v€V|(r + r-i-Ai)(v) = 0 } 

or, multiplying by r, 

®Aj = {v G V I (r^ - AjT + l)(v) = 0 } 

If A| = 2, then since r is normal, we have 

§2 = {v e V I (r- l)2(v) = 0} = {v e V I (r- l)(v) = 0} 
and if = - 2 , 

g _2 = {v e V I (r + l)2(v) = 0} = {v G V I (r + l)(v) = 0} 

Thus, on the eigenspaces 82 and 8_2 (if indeed they exist), the 
operator r is just multiplication by 1 or - 1 , respectively. 

We may decompose each 8 ;^ , for A| ^ ±2, as follows. Take 
vG 6 ;^., and consider 5 pan{v,r(v)}.^ This subspace of 8 ;^, is invariant, 
since V(r(v)) = r^(v) = Ajr(v) — v. Thus, we can write ^ 

8 ;^^ = span{v,r(v)} ® span{v,r(v)}'*' 

Continuing in this way, we can write each 8 ;^, as the orthogonal direct 
sum of two-dimensional subspaces on which V is real unitary. This 
gives 

V = 82 ® 8 _ 2 Q^DiQ-*-®\^ 

where = 2 and each summand is invariant under r. 

Hence, we need only determine the matrix of a real unitary 
operator r on a two-dimensional space ‘J. The matrix of r with 
respect to any orthonormal basis for ^ is orthogonal, and so if 



a2 + b2 = l 
c2 + d2 = l 
ac -h bd = 0 



then it follows that 
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Moreover, since det([r]) is the constant term of the minimal polynomial 
m^(x) = — A-r + 1 = 0, we have det(r) = 1, that is, 

ad — be = 1 

Solving these equations gives d = a and c = -b, and so 



H = 



a b 
-b a 



Since (a,b) is a unit vector in we can write (a,b) = (cos 0,sin 0), 
for some real and so 



M = 



cos 0 sin 0 
-sin 0 cos 0 



Thus, we arrive at the following result. 

Theorem 10.12 Let r be a unitary operator on a finite dimensional 
real inner product space V. Then there is an orthonormal basis for V 
for which the matrix of r has the block form 




1 



-1 



cos 0^ sin 0^ 
-sin 0-^ cos 0^ 



cos 0^ sin 0^ 
-sin 0^ cos 0^ 



block 
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Orthogonal Projections 

We now wish to characterize orthogonal diagonalizability in terms 
of projection operators. 

Definition Let V = SqS^. The projection map p:V— »S on S along 
S'*" is called orthogonal projection onto S. Put another way, a 
projection map p is an orthogonal projection if V = im{p) <J) ker{p), D 

Thus, orthogonal projection is just a special type of projection 
operator, where ker{p) = im{p)^. Note that some care must be taken 
to avoid confusion between the term orthogonal projection and the 
concept of projections that are orthogonal to each other, that is, for 
which pa = (Tp — 0. 

We saw in Chapter 8 that an operator p is a projection operator 
if and only if it is idempotent. Here is the analogous characterization of 
orthogonal projections. 

Theorem 10.13 A linear operator p G £(V) is an orthogonal projection 
if and only if it is idempotent and self-adjoint. 

Proof. Suppose that p is idempotent and self-adjoint. Then p is 
projection on im{p) along ^er(p), and V = im{p) 0 ker{p). 
Furthermore, if x G ker{p)^ we have 

(p(v),x) = (v,/)(x)) = 0 

and so im{p) ± ker{p). Hence, V = nn(p) 0 ier(/>), which shows that 
p is orthogonal projection. 

For the converse, suppose that p is orthogonal projection. Then 
p is idempotent, and we need only show that p is self-adjoint. Since 
p is orthogonal projection, we have V = im(p) ® Aer(/>). But if 
V G 2Ui(/>), then v = p(w) and so 

p{\) = p{p{^)) = = />(w) = V 

Hence all nonzero vectors in ira{p) are eigenvectors associated with the 
eigenvalue 1. Moreover, if x G ker{p), then 

p{x) = 0 = Ox 

and so all nonzero vectors in ker{p) are eigenvectors associated with 
the eigenvalue 0. Therefore, we can find an orthonormal basis for V 
that consists entirely of eigenvectors for />, which means that p is 
normal. Finally, since the eigenvalues of p are real, p must be self- 
adjoint. I 

Note that for an orthogonal projection p, we have 
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(v,p(v)) = (v,/>2(v)) = (p(v),/?(v)) 

The following theorem gives another characterization of orthogonal 
projections. 

Theorem 10.14 A linear operator p G i'(V) is an orthogonal projection 
if and only if it is idempotent and || />(v) || < || v || for all v G V. 

Proof. We leave proof of the necessity as an exercise. Suppose that p 
is idempotent and that || p{y) || < || v || . We want to show that V = 
A;er(p) Q which can be done, according to Theorem 9.13, by 

showing that im{p) C ker^p)^ , Now, the key to this is the fact that 
V = A;er(/9) Q which holds for dim{ker{p)) < oo by the 

projection theorem. However, we will see in Chapter 13 that it is also 
true in general. 

Proceeding under this assumption then, for any w G we 

have w = x-fy, where x£ker{p) and y G ^er( />)**■, and since p is 
idempotent, 

w = /)(w) = p(x) + p(y) = p(y) 

and so 

l|x||2+ ||y||2= ||w||2= llp(y)||2< ||y|| 

which implies that ||x|| =0, and hence that x = 0. Thus, w = 
y G ker{p)^^ and so im(p) C ker{p)^^ as desired. I 

The next three theorems gives some additional properties of 
orthogonal projections. 

Theorem 10.15 

1) If p and a are both orthogonal projections, then pa = 0 
implies ap = 0. 

2) Two orthogonal projections p and a are orthogonal to each 
other if and only if irn(p) i. irn{a). I 

Theorem 10.16 Let V be a vector space over a field of characteristic 

7 ^ 2 . 

1) Let p and a both be orthogonal projections. Then p-\-cr is an 

orthogonal projection if and only if p -L <7, in which case p + cr is 
projection on nn(p) Q 2 m(cr) along ker{p) fl ker{(T). 

2) Let P|, . . . , Pk orthogonal projections. Then p = p^ H h Pk 

is an orthogonal projection if and only if pj Jl pj for all i ^ j. 

3) Let p and a both be orthogonal projections. Then p — cr is an 

orthogonal projection if and only if 



pa = ap = cr 
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in which case p — a is projection on im(/>) fl ker{cr) along 
ker{p) Q im(cr). 

4) Let p and cr both be orthogonal projections. If pa a p then 

pa is an orthogonal projection. In this case, pa is projection on 
im{p) n im{a) along ker(p) Q ker{a). 

Proof. We prove only part (2). If the p-’s are orthogonal projections, 
and if ± Pj for all i j, then p^pj = 0 for all i j, and so it is 
straightforward to check that p^ = p and that p* = p. Hence, p is 
an orthogonal projection. Conversely, suppose that p is an orthogonal 
projection, and that x G *m(p|). Then P|(x) = x, and so 

II X II ^ > II p(x) II ^ = {p(x),p(x)) = (p(x),x) 

= II w II ^ ^ II ^iW II ^ = II * II ^ 

j j 

which implies that Pj(x) = 0 for j i. In other words, 

Jm(pj) C ker{p^) = 

Therefore, 

0 = (pj(v),Pi(w)) = (p;pj(v),w) 

for all v,w G V, which shows that p-p- = 0, that is, p- ± p*. | 

Theorem 10.17 The following statements are equivalent for orthogonal 
projections p and a, 

1) ((p — <t)(v),v) > 0 for all v G V 

2) II <^(v) II < II p(v) II for all v G V 

3) im(a) C im{p) 

4) pa — a 

5) ap = a 

If any (and hence all) of these conditions obtain, we say that a is less 
than or equal to p, and write (t < p. 

Proof. Suppose that (1) holds. Then 

0 < ((^ - <^)(v),v) = (p(v),v) - (o-(v),v) 

= (^(v),/>(v)) - (<7(v),<r(v)) = II p(v) II 2 - II o-(v) II 2 

from which (2) follows. Next, suppose that (2) holds. Then for any 
V G im{a), we have v = x + y, where x G im{p) 1 y G ker{p). Then, 

II X II ^ + II y II ^ = II V II 2 = II a{v) II 2 < II ^(v) II 2 = II X II 2 

and so y = 0, that is, v G im(p). This proves (3). Now suppose that 
(3) holds. Then since <r(v) G im(a) C im{p) for any v G V, we have 

p{a{v)) = <r(v) 
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and so pa = cr. Hence, (4) holds. If (4) holds, then ap = a*p* = 
(^cr)* = ( 7 * = < 7 , and so (5) holds. Finally, suppose that (5) holds. 
Then so does (4), and so p — a is an orthogonal projection, from which 
it follows that 



Up - <^)(v),v) = {{p - (T)iy),{p - (t)(v)) > 0 
and so (1) holds. I 

Orthogonal Resolutions of the Identity 

Definition If p^,...,pj^ are projections for which 

(10.2) H \-p^ = i 

is a resolution of the identity, then we refer to (10.2) as an orthogonal 
resolution of the identity. D 

The following theorem displays a correspondence between 
orthogonal direct sum decompositions of V and orthogonal resolutions 
of the identity. It should be compared to Theorem 8.17. 

Theorem 10.18 

1) If p^-\ ^ p^ = i is an orthogonal resolution of the identity, 

then 

V = ifn{p^) ® • • • ® i'in{pi^) 

2) Conversely, if V^=: ® • • *® S]^ and is projection on S- 

along S^®-*-®Si®**-®Sj^, where the hat " means that the 
corresponding term is missing from the direct sum. Then 

Pl~\ Pk “ ^ 

is an orthogonal resolution of the identity. 

Proof. To prove (1) suppose that pj 4 \-p^ = t is an orthogonal 

resolution of the identity. According to Theorem 8.17, we have 

V = im{p-^) 0 • • • 0 i'm{p^) 

However, since the pj’s are orthogonal, they are self-adjoint, and so for 

i 

(Pi(v),/>j(w)) = {y,p-p-{vi)) = 0 

which shows that 

V = im{p^) ® • - ® im{p^) 

For the converse, we know from Theorem 8.17 that 
Pl~\ ^ ^ ^ resolution of the identity, and we need only show 
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that each 
fact that 



is an orthogonal projection. But this follows from the 
im(pi) = Si ± = ker{p-) I 



The Spectral Theorem 

We can now characterize the orthogonally diagonalizable operators 
on a finite dimensional complex inner product space. 

Theorem 10.19 (The spectral theorem for normal operators) Let 
T G -L(V), where V is a finite dimensional complex inner product space. 
The following statements are equivalent. 

1) r is orthogonally diagonalizable, that is, 

V = 8x ©•••©8x 

2) r is normal, that is, 

TT = T T 

3) r has the orthogonal spectral resolution 

(10.3) r = AjPj H h 

where A| G C and where p^-\ p^ = i is an orthogonal 

resolution of the identity. 

Moreover, if r has the form (10.3), where the A|’s are distinct and 
the Pi’s are nonzero, then the A|’s are the eigenvalues of r and 
nn(p|) is the eigenspace of r associated with A|. 

Proof. We have seen (Theorem 10.9) that (1) and (2) are equivalent. 
Suppose that r is orthogonally diagonalizable. We know from 
Theorem 8.18 that (10.3) holds for some resolution of the identity, and 
we need only observe that since 

V — 8 \ ©•••©8\ 

this is an orthogonal resolution. Hence, (3) holds. 

Conversely, if (10.3) holds, we have 

V = im(p^) ® • • • ® nn(pj^) 

But Theorem 8.18 implies that im(p-^) = 8^ , and so r is orthogonally 
diagonalizable. | ' 

In the real case, we have the following. 

Theorem 10.20 (The spectral theorem for self-adjoint operators) Let 
T G *L(V), where V is a finite dimensional real inner product space. 
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The following statements are equivalent. 

1) r is orthogonally diagonalizable, that is, 

V = 8. ©•••Q8. 

2) r is self-adjoint, that is, 

* 

T = T 

3) r has the orthogonal spectral resolution 

(10.4) r = AjPj H h 

where A| E R and pj H j- ^ is an orthogonal resolution of 

the identity. 

Moreover, if r has the form (10.4), where the A^’s are distinct and 
the /9|’s are nonzero, then the Aj’s are the eigenvalues of r, and 
im(p|) is the eigenspace of r associated with Aj. | 



Functional Calculus 

Let us consider some applications of the spectral theorem. Recall 
that if V is a vector space over F, if r E L(V), and if p(x) is a 
polynomial over F, then the operator p(r) E -L(V) is well-defined. 
Now, suppose that V is a finite dimensional inner product space, and 

r has spectral resolution r = X^p^-i (- Aj^p|^. Then pf^ = for 

m > 1, and pjpj = 0 for i j. Thus, 

r" = + h = A\Vi + h A^^^ 

and, more generally, for any polynomial p(x) over F, 

p(r) = p(Aj)pj + • • • + p(A^^)pj^ 

In fact, we can extend this further by defining^ for any function 
f:{Ai,...,Aj,HF, 

f(r) =f(Ai)pi + --- + f(Ak)/>k 

Thus, we may define y'V, e^, and so on. Notice, however, that 
since the spectral resolution of r is a finite sum, we actually gain 
nothing (but convenience) by using functions other than polynomials. 
To see this, suppose that f:{Aj, . . . , Aj^}^F is any function, and let 

f(Aj) = Oj 

Then we can find a polynomial p(x) for which p(A|) = for i = 
1,. . .,k, and so 

f(r) = f(Ai)pi + • • • + f(Ak)Pk = P(^i)/>i + • • • + P(\)Pk = P(^) 

The study of the properties of functions of an operator r is referred to 
as the functional calculus of r. 
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According to the spectral theorem, if V is complex (F = C) and 
r is normal, then f(r) is a normal operator whose eigenvalues are 
f(Aj). Similarly, if V is real (F = R), and r is self-adjoint, then f(r) 
is self-adjoint, with eigenvalues f(Aj). Let us consider some special 
cases of this construction. 

For each j = l,...,k, if Pj(x) is a polynomial for which 
Pj(Aj) = l, Pj(Aj) = 0 for 

then 

Pj(r) = Pj 

and so we see that each projection p- is a polynomial function of r. 

If r is invertible, then ^ 0 for all i, and so we may let 
f(x) = x■^ giving 

= K^P\ + • • • + Vk 

as can easily be verified by direct calculation. 

If f(A-) = A. and if r is normal, then each is self-adjoint, 
and so 

f(r) = AiPi + --- + Vk = ’^* 

The functional calculus can be applied to the study of the 
commutativity properties of operators. Here are two simple examples. 

Theorem 10.21 Let r have spectral resolution 

r = AjPi + • • • + 

Then an operator a commutes with r if and only if it commutes 
with each pp 

Proof. If a commutes with each pp then clearly a commutes with 
r. For the converse, we simply observe that is a polynomial in r, 
and since a commutes with r, it commutes with any polynomial 
in r. I 

Theorem 10.22 Let V be a finite dimensional complex inner product 
space, and let r,cr G L(V) be normal operators. Then r and a 
commute if and only if they have the form r = p(0), a = q(0), where 
p and q are polynomials, and 0 = t{t,ct) is a polynomial in r 
and a. 

Proof. If r and a are polynomials in 0, then they clearly commute. 
For the converse, suppose that ra = ar, and let 

^ = + + Vk 



= PlVl+--- + Pm^niL 



and 
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be the orthogonal spectral resolutions of r and cr. Then according to 
Theorem 10.21, Now, let us choose any polynomial r(x,y) 

with the property that a— = r(Aj,//j) are distinct. Since each p- and 
i/j is self-adjoint, we may set 6 — r(r,cr) and deduce that 

e = x(r,(T)= 

ij 

We also choose p(x) and q(x) so that p(«|j) = Aj for all j and 
q(o;-j) =: pj for all i. Then 

pW = = X] Vi^j = (X] Vi)(S»^j) = XZ Vi = 

ij iJ i j i 

and similarly, q(0) =: cr. | 

Positive Operators 

One of the most important ca^es of the functional calculus is when 
f(x) = ^yx, First, we need some definitions. 

Definition A self- adjoint linear operator r G i'(V) is nonnegative if 
(r(v),v) > 0 for all v G V and positive if it is nonnegative and 
(r(v),v) >0 for v 0. D 

Theorem 10.23 A self-adjoint operator r on a finite dimensional inner 
product space is 

1) nonnegative if and only if all of its eigenvalues are nonnegative 

2) positive if and only if all of its eigenvalues are positive. 

Proof. If (r(v),v) > 0 and r(v) = Av, then 0 < (t(v),v) = A(v,v), and 
so A > 0. Conversely, if all eigenvalues of r are nonnegative, then we 
have 

r = A^p^ H h Aj^pj^, A| > 0 

and since l — p^-\ h pj^, 

(r(v),v) = E'^i(Pi(v),Pj(v)) = II Pi(v) II ^ > 0 

and so r is nonnegative. Part (2) is proved similarly. I 

If r is a nonnegative operator, with spectral resolution 

= + + Vk> 

then we may take the nonnegative square root of r, 

= \AiPi VKPk 
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where is the nonnegative square root of Ap 

It is clear that 

(^A)^ = r 

and it is not hard to see that is the only nonnegative operator 

whose square is r. In other words, every nonnegative operator has a 
unique nonnegative square root. Conversely, if r has a nonnegative 
square root, that is, if r = cr^, for some nonnegative operator tr, then 
r is nonnegative. Hence, an operator r is nonnegative if and only if 
it has a nonnegative square root. 

Here is an application of square roots. 

Theorem 10.24 If r and cr are nonnegative operators, and rcr = err, 
then Tor is nonnegative. 

Proof. Since r is a nonnegative operator, it has a nonnegative square 
root y^, which is a polynomial in r, and similarly for cr. Therefore, 
since r and <t commute, so do y/r and y/a. Hence, 

Since y/r and y/a are self-adjoint and commute, their product is 
self-adjoint, and so ra is nonnegative. I 



The Polar Decomposition of an Operator 

It is well-known that any nonzero complex number z can be 
written in the polar form z = re‘^, where r is a positive number, and 
9 is real. We can do the same for any nonzero linear operator r on a 
finite dimensional complex inner product space. 

Theorem 10.25 Let r be a nonzero linear operator on a finite 
dimensional complex inner product space V. Then there exists a 
unique positive operator />, and a unitary operator i/ for which r = 
i/p. Moreover, if r is invertible, then u is also unique. 

Proof. Let us suppose for a moment that r = i/p. Then r* = 

(i/py = p*i/* = pi/~'^ and so 

♦ -1 2 
T T = pj/ up = p 

Also, if V G V, then 

r(v) = vp{\) 

These equations give us the clue as to how to define p and u. 

Let us define p to be the unique nonnegative square root of the 
nonnegative operator r*r. Then 
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(10.5) 

II ^(v) II ^ = {p{v),p{y)) = (/>^(v),v) = (r*r(v),v) = || r(v) |1 ^ 

Let us define u on the image im(p) by 

( 10 . 6 ) Hpi'f)) = ' t ( v ) 

for all V G V. To see that this is well-defined, observe that (10.5) gives 
p{v) = p(w) /)(v - w) = 0 => II p(v - w) II =0 

II r(v - w) II = 0 r(v) = r(w) 

Moreover, u is an isometry (on its domain im{p)), since (10.5) again 
gives 



K/>(v)) II = II r(v) II = II p(v) 



Since i/:nn(p)— >nn(i/) 
and so 



is injective, we have 
dim{im{p)) = diin(im{u)) 



dim(i7n{p)'^) = dim{im{u)^) 



which means that we may extend i/ to a unitary map (perhaps in 
many ways) u on V. Equation (10.6) then shows that r = i/p. 

As for the uniqueness, suppose that r = i/p — i/'p\ Then 

r*r = =: p^ and r*r = =: (p')^ 

and so p^ = (p')^, and since p^ has a unique nonnegative square root, 
we deduce that p = p'. Thus, p is unique. Finally, if r is 
invertible, then (10.5) shows that p is also invertible. Hence, /> is a 
bijection, and so (10.6) uniquely determines v, I 



Applying the previous theorem to the map r*, we get 

T = {t*Y = {vpf = pU~^ = pp 

We leave it as an exercise to show that any unitary operator p has the 
form p == e^^, where cr is a self-adjoint operator. This gives the 
following corollary. 



Corollary 10.26 (Polar decomposition) Let r be a nonzero linear 
operator on a finite dimensional complex inner product space. Then 
there is a unique positive operator p and a self-adjoint operator cr for 
which r has the polar decomposition 



Normal operators can be characterized using the polar 
decomposition. 
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Theorem 10.27 Let r = be a polar decomposition of a nonzero 
linear operator r. Then r is normal if and only if per = ap. 



Proof. Since 
and 



rr* = pe^^e p — p^ 



T*T = e-'^‘^ppe'<' = 



we see that r is normal if and only if 



e-'^p'^e'^ = 



p'^e'^ = e'^p'^ 



or equivalently, 

(10.7) 

Now, /> is a function of and cr is a function of e'^, and so (10.7) 
holds if and only if pa = ap. I 



EXERCISES 

1. Prove that r is self-adjoint (unitary) if and only if any matrix 
that represents r, with respect to an ordered orthonormal basis 
O, is Hermitian (unitary). (Substitute the correct terms when 
F = R.) 

2. Show that if r is self-adjoint, then so is r” for any n G N. 

3. Let r G ^(V), and let 

+ and 

Show that Tj and T 2 are self-adjoint, and that 
r = -h ir 2 and r* = — ir 2 

What can you say about the uniqueness of these representations of 
r and r*? 

4. Show that a nonzero self-adjoint operator cannot be nilpotent. 

5. Prove that all of the roots of the characteristic polynomial of a 
skew-Hermitian matrix are pure imaginary. 

6. Prove that if r is unitary, then so is r"”^. 

7. Prove that if a^r are unitary, then so is ar. 

8. Prove that a normal operator is unitary if and only if all of its 
eigenvalues have absolute value 1. 

9. Let r be a unitary operator on a finite dimensional inner product 
space V. Show that if a subspace S of V is invariant under r, 
then so is S'*". 

10. Give an example of a normal operator that is neither self-adjoint 
nor unitary. 

Prove that if || r(v) || = || r*(v) || for all v G V, where V is 



11. 
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complex, then r is normal. 

12. Show that if r is a normal operator on a finite dimensional inner 
product space, then r* = p(^)? for some polynomial p(x) G F[x]. 

13. Show that a linear operator r on a finite dimensional inner 
product space V is normal if and only if whenever S is an 
invariant subspace under r, so is S'*". 

14. Let V be a finite dimensional inner product space, and let r be 
a normal operator on V. 

a) Prove that if r is idempotent, then it is also self-adjoint. 

b) Prove that if r is nilpotent, then r = 0. 

c) Prove that if = r^, then r is idempotent. 

15. Show that if r is a normal operator on a finite dimensional 
complex inner product space, then the algebraic multiplicity is 
equal to the geometric multiplicity for all eigenvalues of r. 

16. Use the results of the previous exercise to show that if r is 
normal, and if err == then err* = r*er. In other words, r* 
commutes with all operators that commute with r. 

17. Recall that it is possible for two projections to have the property 
that crp is a projection, but per is not. Show that this cannot 
happen if p and er are both orthogonal projections. 

18. Show that two orthogonal projections a and p are orthogonal 
to each other if and only if im(er) ± i?n(p). 

19. Show that the spectral resolution of a normal operator is unique. 

20. If 1 / is a unitary operator on a complex inner product space, show 
that there exists a self-adjoint operator cr for which v = e^^. 

21. Show that, in the complex case, we need not specify that r is 
self-adjoint in defining nonnegative operators. 

22. Show that a nonnegative operator has a unique nonnegative 
square root. 

23. Let /?• be complex numbers, for i = l,...,k. Construct a 
polynomial p(x) for which p(c^j) = for all i. 

24. Prove that if r has a square root, that is, if r = for some 
nonnegative operator cr, then r is nonnegative. 

25. Prove that a self-adjoint operator on a finite dimensional inner 
product space is positive if and only if all of its eigenvalues are 
positive. 

26. Prove that if cr <r and if 9 is a positive operator that 
commutes with both a and r, then aO < t9, 

27. Does every self-adjoint operator on a finite dimensional real inner 
product space have a square root? 

28. Let T be a liner operator on and let Aj,...,Aj^ be the 
eigenvalues of r, each one written a number of times equal to its 
algebraic multiplicity. Show that 
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where ir is the trace, defined in the exercises in Chapter 8. Show 
also that equality holds if and only if r is normal. 




Part 2 

Topics 




CHAPTER 11 

Metric Vector Spaces 



Contents: SymmeiriCj Skew-symmetric and Alternate Forms, The 

Matrix of a Bilinear Form. Quadratic Forms. Linear Functionals. 
Orthogonality. Orthogonal Complements. Orthogonal Direct Sums. 
Quotient Spaces. Symplectic Geometry- Hyperbolic Planes. Orthogonal 
Geometry- Orthogonal Bases. The Structure of an Orthogonal 
Geometry. Isometries. Symmetries. Wittes Cancellation Theorem. 
Wittes Extension Theorem. Maximum Hyperbolic Subspaces. 
Exercises. 



Symmetric, Skew- Symmetric and Alternate Forms 

In this chapter, we study vector spaces over arbitrary fields that 
have a bilinear form defined on them. As we will see, the study of such 
vector spaces has a very geometric flavor, and hence so does the 
terminology. 

Unless otherwise mentioned, all vector spaces are assumed to be 
finite dimensional. The symbol F denotes an arbitrary field, and F^ 
denotes a finite field of size q. 

Definition Let V be a vector space over F. A mapping {,}:V x V— >F 
is called a bilinear form if it is a linear function of each coordinate, that 
is, if 

(ax + /?y,z) = a(x,z) + /?(y,z) 

and 

(z,ax+ /?y) = a(z,x) + ^(z,y) 
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A bilinear form is 

1) symmetric if 

(x,y) = (y,x) 

for all X, y G V. 

2) skew-symmetric if 

(x,y) = -(y,x) 

for all x,y G V. 

3) alternate if 

(x,x) = 0 

for all X G V. D 



Definition A bilinear form that is either symmetric, skew-symmetric, or 
alternate is referred to as an inner product, and a pair (V,(,)), where V 
is a vector space and (,} is an inner product on V, is called a metric 
vector space. D 

Notice that the real inner products discussed in Chapter 9 are 
inner products in the present sense and have the additional property of 
being positive definite. On the other hand, the complex inner products 
of Chapter 9, being sesquilinear, are^ not inner products in the present 
sense. Note also that metric vector spaces should not be confused with 
metric spaces, which we will study in the next chapter. 

As is traditional, when the inner product is understood, we will 
use the phrase “let V be a metric vector space.” 

Definition Let V be a metric vector space over a field F. If (,} is 
symmetric, then V is called an orthogonal geometry over F, and if 

(,) is alternate, then V is called a symplectic geometry over F. D 

Thus, a real inner product space is an orthogonal geometry, but a 
complex inner product space is not an orthogonal geometry. 

As we will see, not all metric vector spaces behave as nicely as the 
real inner product spaces, and this necessitates the introduction of a 
new set of terminology to cover various types of behavior. Here is one 
example. 

Definition A metric vector space is nonsingular (or nondegenerate) if 

(x,v) = 0 for all V G V => X = 0 D 



Example 11.1 Minkowski space 

nonsingular real orthogonal geometry R 

by 



is the four-dimensional 
with inner product defined 



(ei,ei) — (02,02) — (63,03) — 1 
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= {ei^ej) = 0 for i^j 

where is the standard basis for R^. I 

The concepts of being symmetric, skew-symmetric and alternate 
are not independent. However, their relationship depends on the 
characteristic of the base field F. 

Theorem 11.1 Let V be a vector space over a field F. 

1) If char(F) = 2, then a bilinear form on V is skew-symmetric if 
and only if it is symmetric. Furthermore, an alternate bilinear 
form is symmetric (and skew-symmetric). 

2) If char(F) ^ 2, then a bilinear form on V is skew-symmetric if 
and only if it is alternate. 

Proof. First, we observe that for any field, if {,) is alternate, then 
0 = (x + y,x + y) = (x,x) + (x,y) + (y,x) + (y,y) = (x,y) + (y,x) 

Thus, 

(x,y) + (y,x) = 0 
or 

(x,y) = -(y,x) 

which shows that (,) is skew-symmetric. Thus, alternate implies skew- 
symmetric. 

Now, if char(F) = 2, then -a = a for all a G F, and so the 
definitions of symmetric and skew-symmetric are equivalent. Suppose 
that char(F) ^ 2. Then if (,) is skew-symmetric, for any xG V, we 
have 

(x,x) = -(x,x) 
or 

2(x,x) = 0 

which implies that (x,x) — 0. Hence, (,) is alternate. I 

Theorem 11.1 tells us that we do not need to consider skew- 
symmetric forms per se, since skew-symmetric is always equivalent to 
either symmetric or alternate. 

Example 11.2 The standard inner product on V(n,q), defined by 

(xi,...,XjJ-(yi,...,yJ =xiyi + --- + x„y,^ 

is symmetric, but not alternate, since 



D 
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The Matrix of a Bilinear Form 

If ^ is an ordered basis for a metric vector space 

V, then the form (,) is completely determined by the n x n matrix of 
values 

which is referred to as the matrix of the form (,) with respect to the 
ordered basis 

Observe that if x = Ex-b- and y = ^Yjbj, then 

(x,y) = (E^ijyj)= 

i j 1 j 

where [x]g^ and [y]<^ are the coordinate matrices of x and y, 
respectively. 

Notice also that a form is symmetric if and only if the matrix 
= (a—) of the form satisfies 

for all 1 < ij < n, that is, if and only if Mcj. is a symmetric matrix. 
Similarly, a form is alternate if and only if the matrix Mg^ = (a—) of 
the form satisfies 

ai,i = 0- = (i^j) 

Such a matrix is referred to as alternate. 

Now let us see how the matrix of a form behaves with respect to a 
change of basis. Let C = (cj,...,c^J be an ordered basis for V. Recall 
from Chapter 2 that the change of basis matrix g^, whose ith 
column is [cjg^, satisfies 

[v]^ = Mc^Mc 

Hence, 

(x,y) = [x]^ 

= (He ^ 

= He 

and so 

Mg = Mg g^ Mg^Mg g^ 

This prompts the following definition. 

Definition Two matrices A, B G congruent if 

there exists an invertible matrix P for which 

A = PBP"^ 



0 
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(Setting Q = gives PBP^ = Q^BQ, and so it doesn’t matter 
whether we use PBP^ or P^BP in the preceding definition.) Let us 
summarize. 



Theorem 11.2 If the matrix of a bilinear form on V with respect to 
an ordered basis = (b^, . . . , is 

then 

{x,y) = [x]^ Mcg[y]g^ 

Furthermore, if C = (c^,. . .,Cj^) is also an ordered basis for V, then we 
have 

where is the change of basis matrix from C to whose ith 

column is I 



Thus, if two matrices represent the same bilinear form on V, they 
must be congruent. Conversely, congruent matrices represent the same 
bilinear form on V. For suppose that B = M<^ represents a bilinear 
form on V, with respect to the ordered basis ^B, and that 

A = P'^BP 



where P is nonsingular. We saw in Chapter 2 (see the discussion 
following Theorem 2.12) that there is an ordered basis C for V with 
the property that 



and so 



Thus, A = Mg 



P - M(0 

A =: Mg M<^Mg 

represents the same form with respect to 



C. 



Theorem 11.3 Two matrices A and B represent the same bilinear 
forms on V if and only if they are congruent. | 

In view of the fact that congruent matrices have the same rank, 
we may define the rank of a bilinear form to be the rank of any matrix 
that represents that form. 

Note that a metric vector space V is nonsingular if and only if 
the matrix M<^ is nonsingular, for any ordered basis ^B. 

If A and B are congruent matrices, then 

det(A) = det(PAP"') = det(P)2det(B) 

and so det(A) and det(B) differ by a square factor. The 

discriminant of a bilinear form is the set of all determinants of the 
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matrices that represent the form under all choices of ordered bases. 
Thus, if det(A) = d for some matrix A representing the form, then 
the discriminant of the form is the set {r^d | 0 ^ r G F}. While it is 
true that the discriminant often does not give us much information 
about the matrix of the form in question, we will see a case where the 
discriminant is a complete invariant for congruence of matrices. 



Quadratic Forms 

There is a close link between symmetric bilinear forms and 
another important type of function defined on a vector space. 

Definition A quadratic form on a vector space V is a map Q:V— >F 
with the following properties 

1) Q(rv) = r^Q(v) for all r G F, v 6 V 

2) The map (u,v)q = Q(u + v) — Q(u) — Q(v) is a (symmetric) 
bilinear form. D 

Every quadratic form Q defines a symmetric bilinear form, by 
(2). On the other hand, if char(F) ^ 2, and if (,) is a symmetric 
bilinear form on V, then we can define a quadratic form Q by 

Q(x) = i(x,x) 

We leave it to the reader to verify that this is indeed a quadratic form. 
Moreover, if Q is defined from a bilinear form in this way, then the 
bilinear form associated with Q is 

(u,v)q = Q(u + v) - Q(u) - Q(v) 

= l(u + v,u + v) - i(u,u) - i(v,v) 

= l(“.v) + ^{v,u} = (u,v) 

which is the original bilinear form. In other words, the maps (,)-^Q 
and Q^(,)q are inverses, and so there is a one-to-one correspondence 
between symmetric bilinear forms on V and quadratic forms on V. 

Again assuming that char(F) ^ 2, if = (bj,...,b^J is an 
ordered basis for an orthogonal geometry V, and if the matrix of the 
symmetric form on V is = (a^ j), then for x = Sx-b-, 

Q(x) = i(x,x) = |[x]5j Mgg[x]gj = Si j^i^j 

bj 

and so Q(x) = Q(x^,...,Xj^) is a homogeneous polynomial of degree 2 
in the coordinates Xp (The term form means homogeneous 
polynomial — hence the term quadratic form,) 
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Linear Functionals 

Let V be a metric vector space over F. The vector space of all 
linear functionals on V is known as the algebraic dual space of V and 
is denoted by V*. Moreover, for finite dimensional vector spaces, we 
have dim{V) = dim(V*), 

Now let X G V, and consider the map defined by 

= (v,x) 

which is easily seen to be a linear functional. Hence, we can define a 
function r:V-^V* by 

r(x) = 

This function is easily seen to be linear, and its kernel is 

{x G V I = 0} = {x G V I (v,x) = 0 for all v G V} 

Hence, if V is nonsingular, the kernel of r is the zero subspace, and 
r is injective. Moreover, since dim{\) = dim(V*), we deduce that r 
is surjective, and so it is an isomorphism from V onto V*. This 
implies that every linear functional on V has the form for some 
X G V. We have proved the Riesz representation theorem for 
nonsingular metric vector spaces. 

Theorem 11.4 (The Riesz representation theorem) Let V be a 

nonsingular metric vector space, and let f G V* be a linear functional 
on V. Then there exists a unique vector x G V for which 

f(v) = (v,x) 

for all V G V. I 



Orthogonality 

A vector x is orthogonal to a vector y, written x J_ y, if 
(x,y) = 0. Any nonzero vector x that is orthogonal to itself is called a 
null vector, or an isotropic vector. 

The following result explains why we restrict attention to 
symmetric or alternate forms (which includes skew-symmetric forms). 

Theorem 11.5 Let (,) be a bilinear form on V. Then orthogonality is 
a symmetric relation, that is, 

(11.1) x±y ylx 

if and only if (,) is either symmetric or alternate. Thus, in these cases, 
we may use the phrase ‘‘x and y are orthogonal.” 

Proof. It is clear that (11.1) holds if (,) is symmetric. If (,} is 
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alternate, then it is skew-symmetric, and so (11.1) also holds. For the 
converse, suppose that (11.1) holds. For x,y and z in V, let 

w = (x,y}z - (x,z)y 

Then x ± w, and so by assumption, w ± x. But this is equivalent to 

(11.2) (x,y)(z,x) - (x,z)(y,x) = 0 
Setting y = X gives 

(11.3) (x,x)((z,x) - (x,z)j = 0 

for all vectors x and z in V. Exchanging x and z, and 
multiplying by -1, give 

(z,z)^(z,x)-(x,z))= 0 

Thus, we deduce that, for any vectors u and v in V, if 

(u,v) rifi (v,u), then u and v are null vectors. Equivalently, if u is 
nonnull, then {u,v} = (v,u) for all v G V. 

Now, suppose that {,) is not symmetric. Then there exists 
vectors u and v for which (u,v) ^ (v,u). Hence (u,u) = (v,v) = 0. 
We wish to show that (a,a) = 0 for any a G V, which will show that 
(,) is alternate. 

Since u is null. 



(11.4) (u 4- a,u -f a) = (u,a) -j- (a,u) -f (a, a) 

Now, if a were nonnull, then (a,x) = (x,a) for all xGV; in 
particular, (a,u) = (u,a) and (a,v) = (v,a). Furthermore, setting y = 
a, X = u, z = V in (11.2) gives 

(u,a)(v,u) - (u,v)(a,u) = 0 

which is equivalent to 

(u,a)((v,u)-{u,v)) = 0 
and since (u,v) / (v,u), we must have 



Similarly, 

Hence, (11.4) becomes 
But 



(a,u) - (u,a) = 0 
(a,v) = (v,a) = 0 
(u + a,u + a) = (a, a) 



(u + a,v) = (u,v) 7 ^ (v,u) = (v,u + a) 



and so u + a is also null, showing that (a, a) = 0, which contradicts 
the assumption about a. Hence, all a G V are null, and (,} is 
alternate. I 
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Orthogonal Complements 

If S is a subset of a metric vector space V, then S inherits the 
metric structure from V. With this structure, we refer to S as a 
subspace of V. 

Deflnition Two subspaces S and T of a metric vector space V are 
orthogonal, denoted by S ± T, if (s,t) = 0 for all s G S and t G T. 
The orthogonal complement of S, denoted by S*^, is the set 

S'" = {v € V I V 1 S} D 

DeOnition If V is a metric vector space, then V*" is called the radical 
of V, and denoted by Rad(V), D 

Thus, V is nonsingular if and only if Rad{W) = {0}. Note that 
if S is a subspace of V, then the radical of S is Rad{S) = S fl S'*". 

It should be emphasized that the properties of orthogonality can 
be quite different for arbitrary base fields than for the familiar case of 
the real base field. For instance, in the case of real metric vector 
spaces, we have SnS'*’ = {0}, whereas in the case of metric vector 
spaces over finite fields, for instance, we may even have S = S'*", as the 
next example shows. 

Example 11.3 It is easy to see that the subspace 

S = {0000,1100,0011,1111} 

of V(4,2) has the property that S = S^. Note that V(4,2) is 
nonsingular, and yet the subspace S is quite singular. D 

The previous example notwithstanding, we do have the following 
important result concerning dimensions. 

Theorem 11.6 If S is a subspace of a nonsingular metric vector space 
V, then 

dim(S) + dim{S^) = rfzm(V) 

Proof. For each v G V, let (j)^ be the linear functional in S* defined 
by 

(f)^{\i) = (u,v) 

We define a map r:V^S* by 

r(v) = (j)^ 

This map is linear, and its kernel is 

ker{T) = {v G V 1 = 0} = {v G V I (u,v) = 0 for all u G S) = S**" 
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Moreover, by the Riesz representation theorem, the restriction 
T I g:S^S* is surjective, and so a fortiori, r is surjective, that is, 

2 m(r) = S 

The theorem then follows from the fact that 

dim{im(r)) + dim{ker{T)) = dim(V) I 

Theorem 11.7 If S is a subspace of a nonsingular metric vector space 
V, then 

1) S-^-^ = s 

2) Rad{S) = S n S-^ = Rad{S^) Q 

Let us summarize the terminology related to orthogonality in 
metric vector spaces. (Unfortunately, authors vary somewhat on their 
use of the term isotropic.) 

Definition Let V be a metric vector space. 

1) A nonzero x G V is null, or isotropic, if (x,x) = 0. 

2) The radical of V is Rad(V) = V**". 

3) V is nonsingular, or nondegenerate, if V'*’ = {0}. 

4) V is null if (u,v) = 0 for all u,v G V, that is, if = V. 

5) V is isotropic if V contains at least one isotropic vector. 

6) V is anisotropic if V contains no isotropic vectors. 

7) V is totally isotropic if all vectors in V are isotropic. D 

Orthogonal Direct Sums 

Definition Let V be a metric vector space. If S and T are 
subspaces of V with the property that V = S 0 T and S ± T, then 
we say that V is the orthogonal direct sum of S and T and write 
V = SQT. D 

In view of Example 11.3, it is reasonable to ask under what 
conditions on a subspace S is it true that V = SoS"*". The answer is 
given by the following theorem. 

Theorem 11.8 Let S be a subspace of a nonsingular metric vector 
space V. The following statements are equivalent. 

1) S is nonsingular 2) is nonsingular 

3) SnS-^ = {0} 4) V = S + S-^ 

5) V = SqS-^ 

Proof. According to Theorem 11.7, statements 1, 2 and 3 are 
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equivalent. For any subspaces S and T of a vector space V, 
dim{S + T) = dim{S) + rfim(T) — dim{S fl T) 

and so 

dim(S H- S'*") = dim{S) -f dim{S^) — dim{S fl S'*") 

= dim{V) — dim(S fl S'**) 

which shows that 3 is equivalent to 4, and that 4 implies 5. Since 5 
clearly implies 4, the proof is complete. I 

Most of the important results that we have established so far 
require that the space be nonsingular. Fortunately, the following 
theorem says that we may restrict attention to such spaces, without 
loosing any important structure. 

Theorem 11.9 Let V be a metric vector space. Then 

Y = Rad(V)®S 

where Rad(Y) is null and S is nonsingular. I 

Proof. Let S be a complement of iiarf(V), that is, V = iJarf(V)0S. 
Since all vectors are orthogonal to Rad(Y)^ we have Rad(V) ± S, and 
so V = iiarf(V) QS. Now, if v G Rad{S)^ then v ± S, and so v ± V, 
which implies that v G Rad{Y) flS = {0}, that is, v = 0. Hence, 
Rad{S) = {0}, that is, S is nonsingular. I 



Quotient Spaces 

In general, if S is a subspace of V, the quotient space V/S does 
not inherit a metric structure from V. However, if S = Rad{Y) = V*", 
then Y / Rad{Y) does inherit the metric structure of V as follows. Let 

(u + Rad{Y),w + Rad{Y)) = (u,v) 

To show that this inner product is well defined, we observe that if 
Mi- Rad{Y) = n'-\^Rad{Y) 
then u = u' -f r, where r G Rad{Y). Hence, 

(u + Rad{Y),y -h Rad{Y)) = (u,v) 

= (u' + r,v) = (u',v) = (u' -f Rad{Y),w + Rad{Y)) 
and similarly for the second component. 
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Symplectic Geometry - Hyperbolic Planes 

Let us consider a nonsingular symplectic geometry V. Thus, by 
definition, every vector in V is null. Given u G V, there must exist a 
V G V for which (u,v) ^ 0, since V is assumed nonsingular. Consider 
a two-dimensional subspace H with basis {u,v}. Then 

(u,u) = (v,v) = 0 

and (u,v) = a ^ 0. Replacing v by a”^v, we can assume that 

(u,v) = 1, (v,u) = -1 

The subspace H, thought of as a metric vector space, has matrix with 
respect to the basis {u,v} 



We pause for a definition. 

Definition Let V be a metric vector space. If u,v G V have the 
property that 

(u,u) = (v,v) = 0, (u,v) = 1 

the ordered pair (u,v) is called a hyperbolic pair, and the subspace 
H = 5paw{u,v} is called a hyperbolic plane. Any space of the form 
H j Q • • • Q Hj^, where each H| is a hyperbolic plane, is called a 
hyperbolic space. D 

Note that in an orthogonal geometry, if (u,v) is a hyperbolic pair, 
then (v,u) = 1, but in a symplectic space, (v,u) = -1. 

Now let us return to the discussion at hand. Since H is 
nonsingular, we have V = HQH'*', where !!■*■ is also nonsingular. 
Hence, we may repeat the preceding construction in to obtain an 
orthogonal decomposition of V of the form 

where each H| is a hyperbolic plane. This proves the following result. 

Theorem 11.10 Any nonsingular symplectic geometry V is a 
hyperbolic space, that is. 



where each H- is a hyperbolic plane. Thus, there is a basis for V for 
which the matrix of the form is 
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M = 



0 1 

-1 0 

0 1 

-1 0 



0 1 

-10 

In particular, the dimension of V is even. D 



Corollary 11.11 Any symplectic geometry V has the form 
V = HiQH2Q---®HkQN 

where each H| is a hyperbolic space, and N is a null space. D 



Orthogonal Geometry - Orthogonal Bases 

The structure of orthogonal geometries is more closely tied to the 
characteristic of the base field than is the case for symplectic 
geometries. 

Definition Let V be an orthogonal geometry. A basis *5 = 
{u^,. . for V is said to be orthogonal if (u|,Uj) = 0 for i^j.Q 

A basis *35 for V is orthogonal if and only if the matrix of 
the form is diagonal. It happens that any orthogonal geometry has an 
orthogonal basis, provided that in case char(F) = 2, we exclude the case 
where V is both orthogonal and symplectic, since no nonnull symplectic 
geometry can have an orthogonal basis. (The matrix with respect to 
such a basis would have Os off the diagonal, by orthogonality of the 
basis and Os on the diagonal, by virtue of V being symplectic.) 

Clearly, we may exclude from consideration the case where V is 
null, since in this case, all bases are orthogonal. 

Let us consider first the case where V is nonsingular, orthogonal, 
and char(F) ^ 2. Let u G V have the property that (u,u) ^ 0. Such a 
vector must exist, for if not, then V would be symplectic, and for 
char(F) ^ 2, there are no nonnull metric vector spaces that are both 
orthogonal and symplectic. Since the subspace S = span{u} is 
nonsingular, we have 

V = S®S-^ 

where S**" is nonsingular and orthogonal. Hence, we can repeat the 
argument on S"^, to get 

V = S@T@T-" 

where S and T are one-dimensional subspaces. Continuing in this 
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way, we get 

V = SiQ...QS^ 

where S| is spanned by a vector U| for which (uj,u-) / 0. Hence, the 
basis is an orthogonal basis for V. Theorem 11.9 then 

implies that any orthogonal metric vector space (singular or 
nonsingular), with char(F) ^ 2, has an orthogonal basis. 

As to the case where V is nonsingular, orthogonal and 
char(F) = 2, assuming that V is not symplectic implies that there is a 
nonnull vector u in V, and so we have 

V = SqS-*- 

just as before. Now, we know that S'*" is nonsingular and orthogonal. 
If it is not symplectic, then we may choose another nonnull vector and 
repeat the process. This will continue until we meet a nonsingular, 
orthogonal, symplectic subspace T of V, which is the orthogonal sum 
of hyperbolic planes, according to Theorem 11.10. Hence, we have 

Now, we leave it to the reader to show that a matrix of the form 

a 0 0 
M= 0 0 1 
0 1 0 

where a ^ 0, is congruent to a diagonal matrix. Hence, we can replace 
the basis vectors for and Hj by basis vectors that will replace 
Sk®Hi by Tk®Tk_|.i ®Tk^ 2 ? where each summand has dimension 1. 
Continuing with this process, we eventually get V as an orthogonal 
sum of one-dimensional subspaces, and so V has an orthogonal basis. 
Another appeal to Theorem 11.9 handles the general (singular and 
nonsingular) case. 

Let us summarize. 

Theorem 11.12 Let V be an orthogonal geometry. Provided that V 
is not symplectic as well when char(F) = 2, then V has an ordered 
orthogonal basis = (u^, . . . , . . . , for which (u|,u.) = a^ ^ 0 
and (Z|,Z|) = 0. Hence, Meg has the diagonal form 
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with k ones and m zeros on the diagonal. Furthermore, the number 
k is the rank of the bilinear form, and so k is uniquely determined 
by V. I 

Corollary 11.13 Let M be a symmetric matrix. If char(F) = 2, we 
assume that M has a nonzero entry on the diagonal. Then M is 
congruent to a diagonal matrix. Q 



The Structure of an Orthogonal Geometry 

According to Corollary 11.13, for char(F) ^ 2, any symmetric 
matrix over F is congruent to a diagonal matrix. However, since two 
distinct diagonal matrices can be congruent, we cannot say that the 
diagonal matrices form a set of canonical forms for congruence. 

It should come as no surprise that the determination of a set of 
canonical forms for congruence depends on the properties of the base 
field. To see this more clearly, suppose that ^ = (bj,...,bjJ is an 
ordered orthogonal basis for V, and so the matrix of the form h^ls the 
diagonal form 



M 






^2 



a 



11 



If r j , . . . , r^^ are nonzero scalars, the set C = (r jbj , . . . , r^^b^J is also an 
ordered orthogonal basis for V, and 



(ribi,rjbj) = rirj(bi,bj) = ^ 

Hence the matrix of the bilinear form with respect to C is 



(11.5) 









4^2 






Thus, by a simple change of basis, we can multiply any diagonal entry 
by a nonzero square in F. 

Before considering some possibilities, we have the following 
definition. 



Definition An orthogonal basis {uj,...,u^J for V is an orthonormal 
basis if (uj,u*) = 1 for all i. D 
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Algebraically Closed Fields 

A field F with the property that every polynomial p(x) G F[x] 
splits into linear factors over F is said to be algebraically closed. For 
example, the field of complex numbers is algebraically closed. However, 
the field of real numbers is not algebraically closed. 

If F is algebraically closed, then the polynomial x^ — r = 0 has 
a solution in F, that is, every element of F has a square root in F. 
Therefore, we may choose — in (11-5), which leads to the 
following result. 

Theorem 11.14 Let V be an orthogonal geometry over an 
algebraically closed field F. Provided that V is not symplectic as well 
when char(F) = 2, then V has an ordered orthogonal basis 55 = 
(uj,...,U|^,Zj,...,Zj^) for which (upU|) = 1 and (zpZ|) = 0, and so 
Mgj has the diagonal form 




with k ones and m zeros on the diagonal. Furthermore, the number 
k is the rank of the bilinear form, and so k is uniquely determined 
by V. In particular, if V is nonsingular as well, then V has an 
orthonormal basis. I 

The matrix version of Theorem 11.14 follows. 

Theorem 11.15 Let 'S be the set of all n x n symmetric matrices over 
an algebraically closed field F. In case char(F) = 2, we restrict S to 
be the set of all symmetric matrices with at least one nonzero entry on 
the main diagonal. 

1) Any matrix in S is congruent to a unique matrix of the form 
Zk for some k = 0,...,n and m = n — k. 

2) The set of all matrices of the form Zk for k + m = n, is a set 
of canonical forms for congruence on J. 

3) The rank of a matrix is a complete invariant under congruence 
on if. I 

The Real Field R 

As we have remarked, the real field R is not algebraically closed. 
However, referring to (11.5), we can choose r- = y/ \ a^ | , so that all 
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diagonal elements will be either 0, 1 or -1. 



Theorem 11.16 (Sylvester’s law of inertia) Any orthogonal geometry 
V over the real field R, has an ordered orthogonal basis = 
(up . . . , . . . , v^,Zj, . . . , Zp) for which (u|,u;) = 1, (vi,Vi) = -1 and 

(Z|,Z|) = 0. Hence, the matrix has the diagonal form 

fi 1 



1 



-1 



- Zk,m,p 



1 



0 



0 



with k ones, m negative ones, and p zeros on the diagonal. 
Moreover, the numbers k, m and p are uniquely determined by V. 



Proof. We need only prove the uniqueness statement. Let 

= 5pan{uj,...,uj^}, K - % = span{z^,...,Zp} 

Then if v = Er-u- G we have 

(v>v) = (EriUj, EijUj) = = Er? > 0 

ij ij 

and so the bilinear form (,) is positive definite on Similarly, the 
form is negative definite on Jf , that is, (v,v) < 0 for all v G . 
Finally, the form is zero on %, Now suppose that C is an ordered 
basis of a similar type to and 

= span{ui,...,Uj^}, X = span{vj,...,Vjjj}, Z = span{zj,...,^} 

Then 

^ n span{Jf ,Z} = {0} 

for if V G then (v,v) > 0 and if v G 5pan{Jf ,Z}, then (v,v) < 0, 
and so v G ^ H ,Z} implies that (v,v) = 0, that is, v = 0. 

Thus, if dim(V) = n, then 

dim{^) -h dim{span{K ^%}) < dim(V) 

that is, 

p + (n - p) < n 



Hence p < p. By symmetry, p < p and so p = p. In a similar way, 
we deduce that n = h and z = z. I 
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Here is the matrix version of Theorem 11.16. 

Theorem 11.17 Let If be the set of all n x n symmetric matrices over 
the real field R. 

1) Any matrix in If is congruent to a unique matrix of the form 

Zk m,p 5 some k, m and p = n — k — m. 

2) The set of all matrices of the form ^k for k + m + p = n is a 

set of canonical forms for congruence on If. 

3) The pair (k,m), or equivalently the pair (k + m, k — m), is a 
complete invariant under congruence on If. The number k + m 
is the rank of the form, and k ~ m is called the signature of the 
form. I 

Finite Fields 

To deal with the case of finite fields, we need two preliminary 
results. 

Theorem 11.18 Let F be a finite field with q elements. 

1) If char(Fq) = 2, then every element of is a square. 

2) If char(F) ^ 2, then exactly half of the nonzero elements of F 
are squares. Moreover, if x is any nonsquare in F^, then all 
nonsquares have the form r\, for some r G F. 

Proof. We first remark that, in any field F, the equation x^ = 1 has 
two solutions x = 1 and -1, which are distinct if and only if 
char(F) / 2. Now, let F = F^, and let F* be the set of all nonzero 
elements in F. Consider the set 

(F*)2 = {a^ I a e F*} 

of all nonzero squares in F. Observe that, for a,b G F* 

= b^ <:> (ab~^)^ = 1 <:> ab“^ = ±1 a=±b 
Thus, if char(F) = 2, 

a^ = b^ ^ a = b 

and so (F*)^ = F*, which proves part (1). On the other hand, if 
char(F) / 2, then the map {a,-a}— ^a^ is a one-to-one correspondence 
between the set of pairs of (distinct) elements of F and (F*)^, and so 
I F* I = 2 I (F*)^ I . We leave proof of the last statement to the reader.l 

Definition A bilinear form on V is universal if for any 0 r G F, there 
exists a vector v for which (v,v) = r. D 

Theorem 11.19 Let V be a nonsingular orthogonal geometry over a 
finite field, with dim(V) > 2 and char(F) / 2. Then the bilinear form 
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of V is universal. 

Proof. Suppose first that V contains a null vector u. Since V is 
nonsingular, there must exist a vector v in V for which {u,v} is 
linearly independent, and (u,v) 0. Let w = au + /?v. For any 

c / 0, we want to determine a and j3 so that 

(11.6) c = (w,w) = 2a(u,v) + /?^(v,v) 

But, setting /? = 1, this is easily solved for a. Hence, in this case, {,) 
is universal. 

Now suppose that V has no null vectors, and that {u,v} are 
linearly independent, with 

(u,u) = a 0, (v,v) = b 0, (u,v) = 0 

Let w = au + /?v. We want to find a and /? for which 

c = (w,w) = aa^ + b/?^ 

Replacing a by ac and b by be and dividing by c 0, our goal is 
to show that, in any finite field of characteristic different from 2, the 
equation 

(11.7) aa^ + b;02^1 

always has a solution (o^,/?). 

If a is a square, then we may set /? = 0, to get 

— a”^, or a = \/ a”^ 

Similarly, if b is a square, we may set a = 0 and solve for /?. So let 
us assume that a and b are nonsquares. 

Observe that -1 is the sum of squares in F^, since if q = p*^, 
the characteristic of F is p, and so 

-1 = 12 + .. .4-12 

where there are p — 1 summands on the right. Hence, any number 
c G F is the sum of squares, since 

4c = (1 + c)^ -f (“1)(1 - 

From this, we deduce that the sum of two squares cannot always be a 
square, for then all elements of F^ would be squares, contradicting 
Theorem 11.18. Hence, there exist nonzero squares r and s in F^ 
for which r^ -f s^ is a nonsquare. 

Thus, a, b and r^ + s^ are all nonsquares in F^. Since 
Theorem 11.18 implies that the product of any two nonsquares is a 
square, we deduce that 

b — u^a and = v^a 
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for some u,v G F. Setting a = r/av and /? = s/uva gives 
2 , L _ ar^ , bs^ _ ar^u^ + bs^ _ b(r^+s^) _ 

<XUt up — ^ cy-r o o O — 999 999 ”' 9 



2 2 ^ 222 
a V u V a 



2 2 2 
u V a 



/ U 1 

2 2 2 ”’ 2 ~ ^ 
u V a u a 



This completes the proof. 



Now we can proceed with the business at hand. Let us settle the 
case char(F) = 2 first. 



Theorem 11.20 Let V be an orthogonal geometry over a finite field 
F, with char(F) = 2. If V is not symplectic, then V has an ordered 
orthogonal beisis ‘JB = (u^, . . . , . . . , for which (upUj) = 1 and 

(zpZ|) = 0, and so M(^ ha^ the diagonal form 




with k ones and m zeros on the diagonal. Furthermore, the number 
k is the rank of the bilinear form, and is uniquely determined by V. 
In particular, if V is nonsingular, then V has an orthonormal basis. 

Proof. Referring to (11.5), since every element of F has a square root, 
we may take r^ = I 

The case char(F) / 2 is a bit more difficult. 



Theorem 11.21 Let V be an orthogonal geometry over a finite field 
F, with char(F) ^ 2. Then there exists a nonzero number d, and an 
orthogonal basis = (uj , . . . , Uj^,Zj , . . . , z^^J for which 

{ui,Uj) = 1 for 1 < i < k-1, (uj^,Uj^) = d, (zj.Zj) = 0 

Hence, the matrix of the form in this basis is 
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The rank k of this matrix is uniquely determined by V. The number 
d is uniquely determined, up to multiplication by a square in F, by 
V. Moreover, the set {r^d | 0 / r G F}, which is the discriminant of the 
form when V is nonsingular, is uniquely determined by V. 

Proof. We know that there is an ordered orthogonal basis S = 
(up . . . , for which {n-,n-) = a; 0 and (zj.Zj) = 0. 
Hence, has the diagonal form 



0 

' 0 

Now, consider the orthogonal geometry = 5pan{uj,U2). Then 
is nonsingular, since ^ 0, and so the form (,}, restricted to V| 

is universal. Hence, there exists a G Vj for which (vjjVj) = 1. 

Since {u^,U 2 } is a basis for V^, we have = ru^ -hsu 2 . If s = 
0, then we form the ordered basis = (v^,U 2 , . . . The 
matrix of the form with respect to this basis is the same as (11.8), 
except that it has a 1 in the upper left corner (in place of a^). If 
s 0, then we form the ordered basis = (v^,u^,U 3 ,...,Uj^.,Z 2 ,...,Zjj^), 
which will have the effect of replacing a^ by 1 and a 2 by a^. 

We now repeat the process with the subspace V 2 generated by 
the second and third vectors in the new ordered basis Continuing 

in this way, we can replace each a| by a 1, for 1 < i < k — 1. We 
leave the remainder of the proof to the reader. I 

Isometries 

We now turn to a discussion of isometries on metric vector spaces. 

Definition Let V and W be metric vector spaces. We use the same 
notation (,) for the bilinear form for each space. A bijective linear 
map r:V^W is called an isometry if 

(ru,rv) = (u,v) 

for all vectors u and v in V. If an isometry exists from V to W, 
we say that V and W are isometric, and write V « W. It is evident 
that the set of all isometries from V to V forms a group under 
composition, called the group of V. 

If V is a nonsingular orthogonal geometry, an isometry from V 
to V is called an orthogonal transformation. If V is a nonsingular 
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symplectic geometry, an isometry from V to V is called a symplectic 
transformation. D 

Note that an isometry r £ i'(V) is always injective if V is 
nonsingular. (It is customary in the theory of metric vector spaces to 
require an isometry to be bijective, unlike the special case of real or 
complex inner product spaces.) Here are a few of the basic properties of 
isometries. 

Theorem 11.22 Let r G £(V,W) be a linear transformation between 
finite dimensional metric vector spaces V and W. 

1) Let = {v^,...,Vj^} be a basis for V. Then r is an isometry if 
and only if r is bijective, and 

(rvj.rvj) = (v;,Vj) 

for all ij. 

2) If char(F) ^ 2, then r is an isometry if and only if it is bijective 
and 

(r(v),r(v)) = (v,v) 

for all V G V. I 

Theorem 11.23 Let r G i'(V) be a linear operator on a finite 
dimensional metric vector space V. Let = (v^,. . ., be an 

ordered basis for V, and let Mgi be the matrix of the form relative to 
Then r is an isometry if ana only if 

(11.9) [t]^ 

Proof. Dropping the subscript for readability, we have 

(x,y) = [x]^Mg^[y] 

and 

(r(x),r(y)) = [r(x)]"M^[r(y)] = [x]"[r]"M^[r][y] 

Hence 

(x,y) = (r(x),r(y)) 
for all x,y G V if and only if 

W^Mgg[y] = [x]‘^[r]'^M^[r][y] 
for all x,y G V, which holds if and only if 

Mgg = [T]"^Mgg[r] I 

If r is an isometry, then (11.9) holds, and we may take 
determinants to get 

det(Mgj) = det([r]gg)^det(Mgg) 
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Therefore, if V is nonsingular, then det(Mg^) ^ 0, and so 

det([r]gg) = ± 1 

Since the determinant is an invariant under similarity, we can make the 
following definition. 

Definition Let r G T(V) be an orthogonal transformation. The 
determinant of r is the determinant of any matrix [r]g^ representing 
r. If det(r) = 1, then r is called a rotation, and if det(r) = -1, then 
r is called a reflection. I 

Because the Riesz representation theorem is valid in any 
nonsingular metric vector space, we can define the adjoint r* of a 
linear map r exactly as we did in Chapter 9, that is, by the condition 

(r(v),w) = (v,r*(w)) 

Theorem 11.24 Let r G i'(V) be a linear operator on a finite 
dimensional nonsingular metric vector space V. 

1) r is an isometry (orthogonal transformation) if and only if it is 
unitary, that is, if and only if r is bijective and rr* = i, 

2) Let r be an isometry. If V = S Q S'*" and S is invariant under 
r, then so is S'*". 

Proof. We prove part (2). Since S is invariant under r, we have 
r(S) C S. But 6fnn(r(S)) = rfnn(S), and so r(S) = S, and S = r”^(S). 
Now, suppose that v G S'*". Then, for any s G S, in view of part (1), 
and the fact that r“^(s) G S, we have 

(r(v),s) = (v,r~^(s)) = 0 

and so r(v) € S'*’. I 



Symmetries 

Suppose that V is a nonsingular metric vector space over F, 
where char(F) 7^: 2, and let uGV be nonnull. Consider the linear 
map 



<^u(v) = V - 



(u,u) 



It is not hard to verify that has the following properties. 

1) O',, is an isometry 

2) %(u) = -u 

3) = X for all x G (5/?an{u)) 

In view of these properties, we refer to as the symmetry determined 
by u. Note that properties (2) and (3) uniquely determine the linear 




228 



11 Metric Vector Spaces 



map (7^, since V = 5pan{u} Q (5pan{u})’*‘. Note also that 

where t is the identity map, and (u) is the subspace spanned by the 
vector u. 

We will require the following property of symmetries. 

Theorem 11.25 Let V be a nonsingular orthogonal geometry over a 
field F, with char(F) ^ 2. If u and v are nonzero vectors in V, with 
(u,u) = (v,v) 91 ^ 0, then there exists a symmetry a for which <t(u) = v 
or cr(u) = -V. 

Proof. Suppose first that u + v is nonnull. Then the symmetry 
is defined, and 

%+v(“ + '") = -(« + v) 

Further, since (u — v,u -f v) = 0, we have 

Combining these two shows that = ~v, as desired. 

If u -f V is null, then u ~ v must be nonnull, for otherwise u 
would be null. Hence, the symmetry is defined. Moreover, 

- v) = -(U - V) 

and 

0-„_v(« + v) = U + V 

These equations show that cr^__^(u) = v, as desired. I 

The next result indicates the importance of symmetries. 

Theorem 11.26 Let V be a nonsingular orthogonal geometry over a 
field F with char(F) ^ 2. Then any orthogonal transformation 
r:V— >V is the product of symmetries on V. 

Proof. We proceed by induction on d = dim(V)^ leaving the case d = 

1 to the reader. Assume the theorem is true for d -* 1, and let 
dim{Y) = d. Choose a nonnull vector u G V. Since (r(u),r(u)) = 
(u,u), we may apply Theorem 11.25 to deduce the existence of a 
symmetry cr on V for which <r(r(u)) = eu, where e = ±1. Since a 
is an isometry, if x G (u)'*', then 

(<rr(x),u} = (<Tr(x),eo-r(u)) = £(x,u) = 0 

and so (rr({u)'*’) C (u)'*'. Hence, (u) and (u)"*" are both invariant 
under err. By the induction hypothesis, applied to the (d-1)- 
dimensional space (u)'*’. 
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(11.10) = ■'-.■•■'-I 

where w- G (u)'*’ and is the symmetry on (u)"^ defined by 

/ X 2(v,w-) 

But this is also a symmetry on V, where Thus, thinking 

of <T_ as defined on all of V, we have ^ 

1 

(11.11) <T^^---<T^Ju) = U = e<7T(u) 

Now we distinguish two cases. If 6 = 1, then (11.10) and (11.11) 
show that 

(TT = 

on both (u) and (u)'*’, which implies that err = on V. 

Finally, since a symmetry is its own inverse, we have 

r = aor^ • • 

On the other hand, if e = -1, then (11.11) gives 

• ■%('*) = = -U = (Tr(u) 

and since fixes all vectors in (u)"^, and (u)'*' is invariant under 
err, (11.10) gives, for xG (u)”^, 

I = <"■ I 



Thus, in this case, 
and so 



CTT = 

U Wj Wj^ 



Witt’s Cancellation Theorem 

We now come to one of the major results of orthogonal geometry, 
due to Witt. To wit: 

Theorem 11.27 (Witt’s cancellation theorem) Let V be a nonsingular 
orthogonal geometry over a field F, with char(F) 2. Suppose that 

V = S®S"- = T®T-" 

where S and T are nonsingular. Then 

S«T 

Proof. Let r:S-^T be an isometry. We proceed by induction on 
di 7 u(S). Suppose first that dim(S) = 1, and that S = span{s}. Then 
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T = 5pan{r(s)} and (r(s),r(s)) = (s,s). According to Theorem 11.25, 
there is a symmetry a for which cr(s) = er(s), where e = ± 1. Hence, 
a is an isometry of V for which cr(S) = T. It follows that 

xeS^ (x,s) = 0 ^ (c^(x),^T(s)} = 0 {(r(x),r(s)) = 0 cr(x) G T**" 

and so the restriction cr \ ± is an isometry from to T***, which 
shows that S'*" T'*’. ^ 

Now suppose the theorem is true for dim(S) < k, and let 
dim(S) = k. Let r:S— be an isometry. Since S is nonsingular, we 
can choose a nonnull vector s G S, and write 



S = span{s} ® U 

where U is nonsingular. Moreover, 

T = span{r(s)} Q r(U) 

V = spanjs} Q U ® S**" 

V = 5paw{r(s)} Qr(U) 



Thus, 

and 



Now we may apply Witt’s Theorem for the one-dimensional case to 
deduce that 

U®S-" wr(U)®T^ 

Suppose that <7:11® S‘*’~^r(U) ©T'*’ is an isometry. Then, since 
(t(U © S'^) = cr(U) © cr(S'^), we have 

<T(U)©(7(S-*-) = r(U)©T'" 

But cr(U) ^ 7*(U), and since di7n{a(\J)) = fifnn(U) < rfmi(S), the 
induction hypothesis implies that S**“ « ^ T”*". I 



Wittes Extension Theorem 

Suppose that V and V' are nonsingular orthogonal geometries, 
and that is an isometry. Suppose also that U is a 

nonsingular subspace of V, and r:U-->r(U) C Y' is an isometry. We 
want to show that r can be extended to an isometry on V. 

Since U is nonsingular, so is r(U). We deduce that 

V' = r(U) ® t(U)-" = <r(U) ® <r(U-") 

Since r(U) U <r(U), the Witt cancellation theorem implies that 

is an isometry, then the product 
is also an isometry, and so 
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is an isometry with the property that r | u = r. This is the extension 
of r that we have been seeking. 

We now propose to show that the assumption that U is 
nonsingular is not needed. The plan is to show first that any subspace 
U of V can be embedded in a nonsingular subspace, and that any 
isometry on U can be extended to this nonsingular subspace. Then we 
may appeal to the nonsingular case, as described earlier. 

Theorem 11.28 Let V be a nonsingular orthogonal geometry over F, 
with char(F) ^ 2. Let U be a subspace of V. Write U = 
Rad{\])®W and W is nonsingular (which we can do by Theorem 
11.9). Suppose that ^ = {b^,. . .,bj^} is a basis for Rad{lJ). Then 
there exist vectors {z^, . . . , Zj^} for which 

1) the pairs (b^z^) are hyperbolic pairs, and so the spaces = 
5pan{bpZ|} are hyperbolic planes, and 

2) U is contained in the nonsm^w/ar space Q* • W. 

Moreover, if r:U— ^r(U) C V' is an isometry, where V' is nonsingular, 
then there exists an isometry r:V-^V' for which r | y = r. 

Proof. We prove (1) and (2) by induction on k = dim{Rad{\])). For 
k = 0, there is nothing to prove. To get a feel for the procedure, let 
k = 1. Thus, = m) is a basis for Rad{\]), 

We want to find a Zj E V for which (b 2 ,Zj) is a hyperbolic pair, 
that is, 

(11.12) (bj,bj) = (zj,Zj) = 0 and (b^Zj) = 1 

and for which, letting = span{h^^z^]j we have 
HinW = {0} and X W 

Suppose we find a z^ E W"*" for which (11.12) holds. Then if x = 
rb^ -f szj E Hj n W, we get 

0 = (rb^ +sz^,Zj) = r 

and so x = szj E W fl W**" = {0}, since W is nonsingular. Hence, 

Hinw = {o} 

Thus, since bj,Zj E W*** imply Hj X W, we need only find a 
z^ E W"*" for which (11.12) holds. Since b^ E W"^, and Rad{W^) = 
iJarf(W) =: {0}, there must exist a vector x E W”*” such that 
(bj,x) ^ 0. Let us set Zj = rbj + sx, and show there exists an r and s 
for which (11.12) holds, that is, for which 

1 = (bpZj) = (bprbj + sx) = s(bpx) 

0 = (zi,Zi) = (rbj + sx,rbj + sx) = 2rs(bj,x) + s^(x,x) 



and 
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Since (bi,x)#0, the first of these equations can be solved for s, and 
the second can then be solved for r. Thus, the desired vector Zj exists 
for which (bj,z^) is a hyperbolic pair, and (1) and (2) hold for the case 
k = 1. 

Assume for the purposes of induction that (1) and (2) are true for 
dim{Rad{\])) < k, and let dim{Rad{\])) = k. Then 

U = Rad{\J) Q W = 5pa7i{b|^} Q span{h -^^, . . , Q W 

and if we let \]q = span{h^,, , QW, then Rad{\]Q) = 

5pan{bj,. . .,bj^_j}. Now, since G Uq, and since 

Rad(\jQ) = Rad{\jQ) = 5pflw{bj,...,bj^_j} 

does not contain bj^, we deduce as before the existence of a vector 
X G Uq for which (l\,x) 0. Again as before, we deduce the existence 

of a vector G Uq for which (bj^^z^) is a hyperbolic pair. Let 

Hjc = X . X . 

Since C Uq, we have Uq C Hj^, and since Hj^ is nonsingular, 

and dim(Rad(\]Q)) =k — 1, we may apply the induction hypothesis to 
Uq, as a subspace of Hence, there exists hyperbolic planes 

in for which 

UqCUq = HiQ...qHj^^iQW 

and since Uq C Hj^, we have Hj^ ± Uq and 

UcHiQ---QHkQW 

To prove the final statement of the theorem, suppose that 
r:U— >r(U) C V' is an isometry, where V' is nonsingular. We know 
that 

U = (bi)Q-.-Q(b;^}®WcHiQ..-QHkQW 

where (b-) is the subspace spanned by b-, and H| = 5pan{bpZ-}. Since 
r is an isometry, we have 

r(U) = (r(bi)) ® • • • ® (r(b^)) ® r(W) 

Now, let r(b|) = 'r(b|) and r(w) = r(w) for all w G W. We need 
only choose r(z|) so as to make r an isometry. 

To this end, let us apply the first part of the theorem to the 
nonsingular space V', with subspace r(U). Since {r(b^),...,r(bj^)} is 
a basis for iiflrf{r(U)}, we deduce the existence of vectors w- G V' for 
which 

r(U)cKi®-.-QKkQW' 

where the K| = 5pan{r(b*),w-} are hyperbolic planes in V'. Hence, if 
we let r(z-) = Wp it follows easily that r is an isometry. I 
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Theorem 11.28, together with the discussion preceding the 
theorem, gives the following. 

Theorem 11.29 (Witt’s extension theorem) Let V and V' be 

isometric nonsingular orthogonal geometries over a field F, with 

char(F) ^ 2. Suppose that U is a nonsingular subspace of V, and 
r:U-^U' C V' is an isometry. Then r can be extended to all of V, 
that is, there is an isometry r:V-^V' for which r | y = r. I 

We consider an application of Witt’s extension theorem. Let V 
be a nonsingular orthogonal geometry over a field F, with char(F) ^ 2. 
Suppose that U and U' are maximal null subspaces of V. (That is, 
U and U' are not properly contained in any null subspaces of V.) 
We propose to show that dim{\]) = rfim(U'). 

If dim{\]) < cfnn(U'), then there is a vector space isomorphism 
r:U-^r(U) C U', which is also an isometry, since U and U' are null. 
Thus, Witt’s extension theorem implies the existence of an isometry 
r:V— »V that extends r. In particular, r”^(U') is a null space that 
contains U, and so r“^(U') = U, which shows that dim{\J) = dim{\]'). 
We now have the following. 

Theorem 11.30 Let V be a nonsingular orthogonal geometry over a 
field F, with char(F) / 2. Then all maximal null subspaces of V 
have the same dimension, which is called the Witt index of V, and is 
denoted by w{Y). I 



Maximal Hyperbolic Subspaces 

Since a hyperbolic space is completely determined (up to isometry) 
by its dimension, it is of interest to know something about maximal 
hyperbolic subspaces of a nonsingular orthogonal geometry. (In the 
symplectic case, if V is nonsingular, then V is hyperbolic.) We will 
denote a hyperbolic space by K, and a hyperbolic space of dimension 
2k by 3t2k’ 

^2k = 

where each Hj is a hyperbolic plane. 

Note that a two-dimensional space is a hyperbolic plane if and 
only if it is nonsingular and contains a null vector. (We assume that 
char(F) ^ 2.) 

Suppose that V is isotropic, that is, V contains a null vector. 
If Uj^ is a nonempty null subspace of V of dimension k, then 
Rad{\J^) = Uj^, and so we may apply Theorem 11.28, to deduce that 
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UkCK2k = HiQ...QHk 

where H| is generated by a hyperbolic pair (xj,y|). Thus, any null 
subspace is contained in a hyperbolic space ^ 2 ^ with 

dim{%2]d ~ 2rf2m(Uk). This implies that the Witt index of V is at 
most dim(y)/2. 

On the other hand, suppose that 

3G2k = HiQ...QHk 

is a hyperbolic space in V, and that H| is generated by the hyperbolic 
pair (xj,y|). Then the set = {x 2 ,...,Xk} is independent, for if 

riXi+--- + r^x^ = 0 

it follows that 

0 = (rjXi + • • • + rj,Xk,yj) = rj(xj,yj) = 

for all j. Moreover, since = 0 for all ij, the subspace Uk = 

span{^} is a k-dimensional null space. Thus, any hyperbolic space 
3t2k V contains a null space Uk- This implies that if 3^2^^ is a 
maximal hyperbolic subspace of V, then m < u;(V). Furthermore, 
since V must contain a null space U^/y^, it must also contain a 
hyperbolic subspace of dimension 2w(\), In other words, the 
maximum dimension of a hyperbolic subspace of V is 2u;(V). 

Now, suppose that 

% = HiQ---QHk 

and 

3G = KiQ...QK^^ 

are maximal hyperbolic subspaces of V, and that Hj is spanned by the 
hyperbolic pair (uj,V|), and Kj is spanned by the hyperbolic pair 
(xj,y|). We wish to show that dim{%) = dim{%). 

Suppose that dim{%) < rfn7i(3G), and consider the vector space 
monomorphism r:3t— ^r(K) C 3G defined by the conditions 

r(uj) = xj, r(vi) = yj 

According to Theorem 11.22, r is an isometry, and so % r(K). 

Thus, Witt’s extension theorem implies the existence of an isometry 
r:V-^V that extends r. In particular, r“^(3G) is a hyperbolic space 
that contains 3t, and so r~*^(3G) = JG, which shows that dim(%) = 
We have shown that all maximal hyperbolic subspaces of V 
have the same dimension, namely, 2w{V). 

Now suppose that 3G is a maximal hyperbolic subspace of V. 
Since hyperbolic spaces are nonsingular, we have V Then 

3G'*' is anisotropic, that is, it contains no null vectors. To see this, 
suppose to the contrary that x G 3G'*’ is a null vector. Since there is a 
null subspace U C for which dirn{\J) = dmi(%)/2 = i^(V), the null 
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space U' = 5pa7i{U,x} has dimension w{y) -h 1, which contradicts the 
meaning of the Witt index. Hence, H"*" contains no null vectors. 

Conversely, suppose that is hyperbolic, that V = 

and that !K> 2 jt is anisotropic. Then ^ 2 ^ is maximal hyperbolic. For 
if not, then % 2 k i^ properly contained in a hyperbolic subspace 3^2 
and we can write 

^2m = ^2k ® ^ 

where 3G is the orthogonal complement of % in we 

claim that 3G is a hyperbolic space of dimension 2(m-“k). For we do 
have 

(11.13) ^2m = ^2k®^2(m-k) 

for some hyperbolic space ^2(m-k)’ Witt’s cancellation 

theorem, 3G « ^2(m-k)* 

Since K2(m-k) a null vector, (11.13) implies that there 

is a null vector x G ^2k ’ contrary to assumption. Hence, 3t2k i® 
maximal. 

Our discussion of hyperbolic subspaces has established the 
following key result. 



Theorem 11.31 Let V be a nonsingular orthogonal geometry over F 
with char(F) 2. Then all maximal hyperbolic subspaces of V have 
dimension 2t/;(V), where w{Y) is the Witt index of V. Moreover, 

V = KQS 

where % is a maximal hyperbolic subspace of V, or = {0} if V 

has no null vectors, and S is an anisotropic subspace of V, that is, S 
contains no null vectors. I 

According to Theorems 11.9 and 11.31, any orthogonal geometry 
V over a field F, with char(F) ^ 2, can be written in the form 

Rad{Y)0%0S 

where Rad{Y) is a null space, is a hyperbolic space, and S is 
anisotropic. 



EXERCISES 

1. Prove that a form is symmetric if and only if the matrix of 
the form is symmetric. 

2. Prove that a form is alternate if and only if its matrix = 
(a; •) is alternate, that is, 
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ai,i = 0, ajj = -aj j (i^j) 

3. Show that a metric vector space V is nonsingular if and only if 

the matrix of the form is nonsingular, for any ordered 

basis 

4. Does Minkowski space contain any null vectors? If so, find them. 

5. Is Minkowski space isometric to Euclidean space R^? 

6. If (,) is a symmetric bilinear form on V, show that Q(x) = 
(x,x)/2 is a quadratic form. 

7. Show that r is an isometry if and only if Q(r(v)) = Q(v) where 
Q is the quadratic form associated with the bilinear form on V. 
(Here char(F) ^ 2.) 

8. Show that a bijective map r:V--^W is an isometry if and only if 
for any basis {vj,...,Vj^} of V, we have 

(rvj.rvj) = {vj,Vj) 

9. Show that if V is a nonsingular orthogonal geometry over a field 
F, with char(F) ^ 2, then any totally isotropic subspace of V is 
also a null space. 

10. Find a metric vector space V for which is singular. Is 

11. Prove that if x is any nonsquare in a finite field F^, then all 
nonsquares have the form r\, for some r G F. Hence, the 
product of any two nonsquares in F^ is a square. 

12. Formulate Sylvester’s law of inertia in terms of quadratic forms 
on V. 

13. Let V be any orthogonal geometry over the real field R. Prove 

that V can be written as a direct sum V = ^ 0 0 Z, where 

the bilinear form on V is positive definite on negative 
definite on Jf and zero on Z. Moreover, the dimensions of 

JSf and Z are uniquely determined by V. 

14. Prove that two one-dimensional metric vector spaces are isometric 
if and only if they have the same discriminant. 

15. a) Let U be a subspace of V. Show that the inner product 

(x-|-U,y-fU) = (x,y) is well-defined if and only if U C Rad(y). 
b) If U C iiarf(V), when is V/U is nonsingular? 

16. Let V = NqS, where N is a null space. 

a) Prove that N = Radiy) if and only if S is nonsingular. 

b) If S is nonsingular, prove that S « V /Radiy), 

17. Let dimiy) = dimiyj). Prove that \ / Radiy) / Radiyf) 

implies V « W. 

18. Let V = SQT. 

a) Prove that Radiy) = Rad[S) ® Rad{T) 

b) W/Rady) « S/Rad{S) Q T/Rad{T) 
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c) dim{Rad(V)) = diin{Rad{S)) + dim{Rad{T)) 

d) V is nonsingular if and only if S and T are both 
nonsingular. 

19. Verify in detail that the adjoint is well-defined and linear. 

20. Prove that r G i'(V), where V is nonsingular, is an isometry if 
and only if it is bijective and unitary. 

21. If char(F) ^ 2, prove that a r G i.(V,W) is an isometry if and 

only if it is bijective and (r(v),r(v)) = (v,v) for all v G V. 

22. Let ^ = {v^,...,VjJ be a basis for V. Prove that r G JL(V,W) 

is an isometry if and only if it is bijective and (rvpTVj) = (vj,Vj) 
for all ij. 

23. Let V be a nonsingular orthogonal geometry, and let r G <L(V) 
be an isometry. 

a) Show that dim{ker{t -- r)) = dim{im{L — r)^), 

b) Show that ker{i — r) = im{t — r)"**. How would you describe 
ker{i — r) in words? 

c) If r is a symmetry, what is dim[ker{L — r))? 

d) Can you characterize symmetries by means of 

dim{ker{i — r))? 

24. A linear transformation r G L(V) is called unipotent \{ t — i is 
nilpotent. Suppose that V is an anisotropic metric vector space, 
and that r is unipotent and isometric. Show that r = l. 

25. Let V be a hyperbolic space of dimension 2m, and let U be a 
hyperbolic subspace of V of dimension 2k. Show that for each 
k < j < ni, there is a hyperbolic subspace % 2 j V for which 
U C C V. 

26. Let V be a symplectic geometry or an orthogonal geometry with 
char(F) 2. Prove that a subspace S of V is a hyperbolic 
plane if and only if S is nonsingular, has dimension 2 and 
contains a null vector. 




CHAPTER 12 

Metric Spaces 



Contents: The DefinUion, Open and Closed Sets, Convergence in a 

Metric Space. The Closure of a Set. Dense Subsets. Continuity. 

Completeness. Isometries. The Completion of a Metric Space. 

Exercises. 



The Definition 

In Chapter 9, we studied the basic properties of real and complex 
inner product spaces. Much of what we did does not depend on whether 
the space in question is finite or infinite dimensional. However, as we 
discussed in Chapter 9, the presence of an inner product, and hence a 
metric, on a vector space, raises a host of new issues related to 
convergence. In this chapter, we discuss briefly the concept of a metric 
space. This will enable us to study the convergence properties of real 
and complex inner product spaces. 

A metric space is not an algebraic structure. Rather it is designed 
to model the abstract properties of distance. 

Definition A metric space is a pair (M,rf), where M is a nonempty set 
and d:M x M-^R is a real- valued function, called a metric on M, with 
the following properties. The expression rf(x,y) is read “the distance 
from X to y.” 

1) (Positive definiteness) For all x,y G M, 
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(f(x,y) > 0 

2) (Symmetry) For all x,y € M, 

(i(x,y) = (i(y,x) 

3) (Triangle inequality) For all x,y,z £ M, 

d{x,y) < </(x,z) + d(z,y) D 



As is customary, when there is no cause for confusion, we simply 
say “let M be a metric space.” 



Example 12.1 Any nonempty set M is a metric space under the 
discrete metric, defined by 



<^(x,y) 



0 

1 



if X = y 
ifxT^y 



D 



Example 12.2 

1) The set R“ is a metric space, under the metric defined, for x = 
(xi,...,x„) and y = (yi,...,y„) by 



<x>y) = \/(xi-yi)^ + -" + (Xjj-yJ^ 



This is called the Euclidean metric on We note that R*' is 

also a metric space under the metric 

<^i(x,y) = l^i-yj I +•••+ |xj,-y„| 

Of course, (R*^,rf) and (R^,rfj) are different metric spaces. 

2) The set is a metric space under the unitary metric 

<^(x>y) = \/|xi-yi |^ + -"+ Kj-y„|^ 
where x = (xi,...,xj and y = (yi,...,yj are in C“. D 



Example 12.3 

1) The set C[a,b] of all real-valued (or complex-valued) continuous 
functions on [a,b] is a metric space, under the metric 

d(f,g)= sup |f(x)-g(x)| 

X G [a,b] 

We refer to this metric as the sup metric. 

2) The set C[a,b] of all real-valued (or complex-valued) continuous 
functions on [a,b] is a metric space, under the metric 

<^i(f(x),g(x)) = [ |f(x)-g(x)| dx 

a 



D 
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Example 12.4 Many important sequence spaces are metric spaces. We 
will often use boldface Roman letters to denote sequences, as in x = 
(xj and y = (yj. 

1) The set of all bounded sequences of real numbers is a metric 
space under the metric defined by 

rf(x,y) = sup I - y„ I 

The set of all bounded complex sequences, with the same 

metric, is also a metric space. As is customary, we will usually 
denote both of these spaces by 

2) For p > 1, let be the set of all sequences x = (x^^) of real (or 
complex) numbers for which 

oo 

n=l 

We define the p-norm of x by 

\l/p 

') 

Then is a metric space, under the metric 

\l/p 

■) 

The fact that is a metric follows from some rather famous 

results about sequences of real or complex numbers, whose proofs 
we leave as (well-hinted) exercises. 

Holder’s inequality Let p,q > 1 and p -f- q = pq. If x G and 
y e then xy = (x^^y^) G and 

l|xylli< ||xllp||y||q 

that is, 

oo / oo / oo 

Liv..i s E Ely..!’ 

n=l \n=l / \n=l / 

A special case of this (with p = q = 2) is the Cauchy-Schwarz 
inequality 

oo 

E 

n=l 

Minkowski’s inequality For p > 1, if x,y G then x + y = 

(Xn + yJe^^ and 

l|x+y lip < l|x||p+ l|y lip 

that is, 
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Definition If M is a metric space under rf, then any nonempty subset 
S of M is also a metric under the restriction of d to S x S. The 
metric space S thus obtained is called a subspace of M. 



Open and Closed Sets 

Definition Let M be a metric space. Let Xq E M and let r be a 
positive real number. 

1) The open ball centered at Xq, with radius r, is 

B(xo,r) = {x G M I <x,Xo) < r} 

2) The closed ball centered at Xq, with radius r, is 

B(xo,r) = {x G M I rf(x,Xo) < r} 

3) The sphere centered at Xq, with radius r, is 

S(xo,r) = {x G M I </(x,Xo) = r} D 

Definition A subset S of a metric space M is said to be open if each 
point of S is the center of an open ball that is contained completely 
in S. More specifically, S is open if for all x G S, there exists an 
r > 0 such that B(x,r) C S. Note that the empty set is open. A set 
T C M is closed if its complement T^ = M ~ T is open. D 

It is easy to show that an open ball is an open set and a closed 
ball is a closed set. If x G M, we refer to any open set S containing x 
as an open neighborhood of x. It is also easy to see that a set is open 
if and only if it contains an open neighborhood of each of its points. 

The next example shows that it is possible for a set to be both 
open and closed, or neither open nor closed. 

Example 12.5 In the metric space R, the open balls are just the open 
intervals 

B(xo,r) = (xq - r,Xo + r) 
and the closed balls are the closed intervals 

B(xo,r) = [xq - i,Xq + r] 

Consider the half-open interval S = (a,b], for a < b. This set is not 
open, since it contains no open ball centered at b G S, and it is not 
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closed, since its complement S^ = (-^,a] U (b,oo) is not open (it 
contains no open ball about a). 

Observe also that the empty set is both open and closed, as is the 
entire space R. (Although we will not do so, it is possible to show that 
these are the only two sets that are both open and closed in R.) D 

It is not our intention to enter into a detailed discussion of open 
and closed sets, the subject of which belongs to a branch of 
mathematics known as topology. In order to put these concepts in 
perspective, however, we have the following result, whose proof is left to 
the reader. 

Theorem 12.1 The collection O of all open subsets of a metric space 
M has the following properties 

1) 0 G O, M G O 

2) If S, TgO, then SHTgO 

3) If {S* 1 i G K} is any collection of open sets, then G O. I 

i6K 

These three properties form the basis for an axiom system that is 
designed to generalize notions such as convergence and continuity, and 
lead to the following definition. 

Definition Let X be a nonempty set. A collection O of subsets of X 
is called a topology for X if it has the following properties 

1) 0 G O, X G O 

2) If S, TgO, then SHTgO 

3) If {S| I i G K} is any collection of sets in 0, then Sj G 0. 

i ^ K 

We refer to subsets in O as open sets, and the pair (X,0) as a 
topological space. D 

According to Theorem 12.1, the open sets (as we defined them 
earlier) in a metric space M form a topology for M, called the 
topology induced by the metric. 

Topological spaces are the most general setting in which we can 
define concepts such as convergence and continuity, which is why these 
concepts are called topological concepts. However, since the topologies 
with which we will be dealing are induced by a metric, we will generally 
phrase the definitions of the topological properties that we will need 
directly in terms of the metric. 



Convergence in a Metric Space 

Convergence of sequences in a metric space is defined as follows. 
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Definition A sequence in a metric space M converges to x G M, 

written (Xj^)-^x, if 

lim rf(x„,x) = 0 

11-400 V n’ / 



Equivalently, (x^J-^x if for any 6 > 0, there exists an N > 0 such 
that 

n > N => < e 

or, equivalently 



n > N => x^e B(x,e) 



In this case, x is called the limit of the sequence (x^^). 



I 



If M is a metric space, and S is a subset of M, by a sequence 
in S, we mean a sequence whose terms all lie in S. We next 
characterize closed sets, and therefore also open sets, using convergence. 



Theorem 12.2 Let M be a metric space. A subset S C M is closed if 
and only if whenever (x^^) is a sequence in S, and (x^^)^x, then 
X 6 S. In loose terms, a subset S is closed if it is closed under the 
taking of sequential limits. 

Proof. Suppose that S is closed, and let (x^^)— ^x, where G S for 
all n. Suppose that x ^ S. Then since x G and S^ is open, there 
exists an € > 0 for which x G B(x,e) C S^. But this implies that 

B(x,e)n{xj = 0 

which contradicts the fact that (x^^)— »x. Hence, x G S. 

Conversely, suppose that S is closed under the taking of limits. 
We show that S^ is open. Let x G S^, and suppose to the contrary 
that no open ball about x is contained in S*^. Consider the open balls 

B(x,l/n), for n = 1,2, Since none of these balls is contained in S^, 

for each n, there is an x^^ G S fl B(x,l/n). It is clear that (Xj^)-^x, and 
so X G S. But X cannot be in both S and S^, and so some ball 
about X is in S^, which implies that S^ is open. Thus, S is 
closed. I 



The Closure of a Set 

Definition Let S be any subset of a metric space M. The closure of 
S, denoted by c/(S), is the smallest closed set containing S. D 

We should hcisten to add that, since the entire space M is closed, 
and since the intersection of any collection of closed sets is closed 
(exercise), the closure of any set S does exist, and is, in fact, the 
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intersection of all closed sets containing S. The following definition 
will allow us to characterize the closure in another way. 

Definition Let S be a nonempty subset of a metric space M. An 
element x G M is said to be a limit point, or accumulation point of S 
if every open ball centered at x meets S at a point other than x 
itself. Let us denote the set of all limit points of S by /(S). D 

Here are some key facts concerning limit points and closures. 

Theorem 12.3 Let S be a nonempty subset of a metric space M. 

1) An element x G M is a limit point of S if and only if there is a 

sequence (x^^) in S for which x^^ ^ x for all n, and (x^^)— ^x. 

2) S is closed if and only if /(S) C S. In words, S is closed if and 

only if it contains all of its limit points. 

3) c/(S) = S U /(S). 

4) An element x is in c/(S) if and only if there is a sequence (x^^) 
in M for which (x^^)^x. 

Proof. 

1) Assume first that x G /(S). For each n, there exists a point 
Xj^ ^ X such that x^^ G B(x,l/n) fl S. Thus, we have 

rf(Xj,,x) < 1/n 

and so (Xj^)-^x. For the converse, suppose that (Xj^)^x, where 
X Xj^ G S. If B(x,r) is any ball centered at x, then there is 
some N such that n > N implies x^^ G B(x,r). Hence, for any 
ball B(x,r) centered at x, there is a point x^^ ^ x, such that 
Xj^ G S n B(x,r). Thus, x is a limit point of S. 

2) As for part (2), if S is closed, then by part (1), any x G /(S) is 
the limit of a sequence (x^^) in S, and so must be in S. Hence, 
/(S) C S. Conversely, if /(S) C S, then S is closed. For if (x^^) 
is any sequence in S, and (x^J-^x, then there are two possibilities. 
First, we might have x^^ = x for some n, in which case x = 
Xj^ G S, Second, we might have x^^ ^ x for all n, in which case 
(Xj^)-^x implies that x G /(S) C S. In either case, x G S and so 
S is closed under the taking of limits, which implies that S is 
closed. 

3) Clearly, S C T = S U /(S). To show that T is closed, we show 
that it contains all of its limit points. So let x G /(T). Hence, 
there is a sequence (x^^) G T for which x^^ ^ x and (Xj^)-^x. Of 
course, each x^^ is either in S, or is a limit point of S. We must 
show that X G T, that is, that x is either in S or is a limit 
point of S. 
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Suppose for the purposes of contradiction that x ^ S and 
X ^ /(S). Then there is a ball B(x,r) for which B(x,r) fl S ^ 0. 
However, since (Xj^)— »x, there must be an x^^ € B(x,r). Since x^^ 
cannot be in S, it must be a limit point of S. Referring to 
Figure 12.1, if d(x^,x) = d < r, then consider the ball B(Xj^,^). 
This ball is completely contained in B(x,r) and must contain an 
element y of S, since its center x^^ is a limit point of S. But 
then y G S n B(x,r), a contradiction. Hence, x G S or x G /(S). 
In either case, x G T = S U /(S) and so T is closed. 

Thus, T is closed and contains S, and so c/(S) C T. On 
the other hand, T = S fl /(S) C c/(S), and so c/(S) = T. 




4) If X G c/(S), then there are two possibilities. If x G S, then the 
constant sequence (x^^), with = x for all x, is a sequence in 
S that converges to x. If x ^ S, then x G /(S), and so there is a 
sequence (x^^) in S for which x^^ ^ x and (x^J-^x. In either 
case, there is a sequence in S converging to x. Conversely, if 
there is a sequence (x^^) in S for which (x^J— >x, then either 
Xj^ = X for some n, in which case x G S C c/(S), or else ^ x 
for all n, in which case x G /(S) C c/(S). I 



Dense Subsets 

The following concept is meant to convey the idea of a subset 
S C M being ^arbitrarily close” to every point in M. 

Definition A subset S of a metric space M is dense in M if 
c/(S) = M. A metric space is said to be separable if it contains a 

countable dense subset. D 

Thus, a subset S of M is dense if every open ball about any point 
X G M contains at least one point of S. 

Certainly, any metric space contains a dense subset, namely, the 
space itself. However, as the next examples show, not every metric 
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space contains a countable dense subset. 



Example 12.6 

1) The real line R is separable, since the rational numbers Q form 

a countable dense subset. Similarly, is separable, since the 

set is countable and dense. 

2) The complex plane C is separable, as is for all n. 

3) A discrete metric space is separable if and only if it is countable. 
We leave proof of this as an exercise. D 



Example 12.7 The space is not separable. Recall that is the 
set of all bounded sequences of real numbers (or complex numbers), 
with metric 

d{x,y) = sup 1 - y„ I 

To see that this space is not separable, consider the set S of all binary 
sequences 

S = {(XjJ I Xj = 0 or 1 for all i} 

This set is in one-to-one correspondence with the of all subsets of N, 
and so is uncountable. (It has cardinality 2 ^ > Kq.) Now, each 
sequence in S is certainly bounded and so lies in Moreover, if 

X y E then the two sequences must differ in at least one position, 
and so rf(x,y) = 1. 

In other words, we have a subset S of that is uncountable, 
and for which the distance between any two distinct elements is 1. 
This implies that the uncountable collection of balls {B(s,l/3) | s G S} 
is mutually disjoint. Hence, no countable set can meet every ball, 
which implies that no countable set can be dense in D 



Example 12.8 The metric spaces are separable, for p > 1. The set 
S of all sequences of the form 



s = (qi,...,q„,0,...) 

for all n > 0, where the q-’s are rational, is a countable set. Let us 
show that it is dense in Any satisfies 



£ 1^1 



< oo 



n=l 



Hence, for any e > 0, there exists an N such that 



E 






n=N-fl 

Since the rational numbers are dense in R, we can find rational 
numbers q| for which 
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Ui-qiP<^ 

for all i = l,...,N. Hence, if s = then 

N oo 

rf(x,s)P= 2 |x„-q„|P+ ^ Kr<f + f = f 

n=l n=N+l 

which shows that there is an element of S arbitrarily close to any 
element of 0^, Thus, S is dense in (P, and so (P is separable. D 



Continuity 

Continuity plays a central role in the study of linear operators on 
infinite dimensional inner product spaces. 

Definition Let f:M— be a function from the metric space (M,rf) to 
the metric space (M',rfO* We say that f is continuous at Xq G M if 
for any e > 0, there exists a 5 > 0 such that 

rf(x,Xo) < (^ rf'(f(x),f(Xo)) < € 

or, equivalently, 

f(B(xo,5)j C B(f(xo),e) 

(See Figure 12.2.) A function is continuous if it is continuous at every 
Xq G M. D 




We can use the notion of convergence to characterize continuity 
for functions between metric spaces. 

Theorem 12.4 A function f:M-^M' is continuous if and only if 
whenever (x^^) is a sequence in M that converges to Xq G M, then the 
sequence (f(XjJ) converges to f(xQ), in short, 

(f(Xn))-^f(Xo) 

Proof. Suppose first that f is continuous at Xq, and let (Xj^)-^Xq. 
Then, given e > 0, the continuity of f implies the existence of a ^ > 0 
such that 



f (B(xo,5)) C B(f(xo),e) 
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Since (Xj^)-^x, there exists an N > 0 such that x^^ G B(xq,6) for 
n > N, and so 

n>N f(Xj^) e B(f(xo),e) 

Thus, f(Xj,)-^f(xo). 

Conversely, suppose that (Xj^)-^Xq implies (f(Xj^))~^f(xQ). 
Suppose, for the purposes of contradiction, that f is not continuous at 
Xq. Then there exists an e > 0 such that, for all 6 > 0 

f(B(xo,«))SB(f(xo),e) 

Thus, for all n > 0, 

f(B(xo,K))§B(f(xo),f) 

and so we may construct a sequence (x^^) by choosing each term x^^ 
with the property that 

Xj, € B(X(j,i), but f(xj ^ B(f(xo),e) 

Hence, (x^J— ^X q, but f(Xj^) does not converge to f(xQ). This 

contradiction implies that f must be continuous at Xq. I 

The next theorem says that the distance function is a continuous 
function in both variables. 

Theorem 12.5 Let (M,rf) be a metric space. If {x^)—^x and 
then rf(Xjj,y„)->(/(x,y). 

Proof. According to Exercise 2, 

I <^(Xn.yn) - I ^ + ‘^(yn-y) 

But the right side tends to 0 as n— >oo, and so d{x^^^y^)—^d{x,y). I 



Completeness 

The reader who has studied analysis will recognize the following 
definitions. 

Definition A sequence (x^^) in a metric space M is a Cauchy 
sequence if, for any e > 0, there exists an N > 0 for which 

h,m > N => rf(Xj^,Xj^J <6 D 

We leave it to the reader to show that any convergent sequence is 
a Cauchy sequence. When the converse holds, the space is said to be 
complete. 
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Definition Let M be a metric space. 

1) M is said to be complete if every Cauchy sequence in M 
converges in M. 

2) A subspace S of M is complete if it is complete as a metric 
space. Thus, S is complete if every Cauchy sequence (s^^) in S 
converges to an element in S. D 

Before considering examples, we prove a very useful result about 
completeness of subspaces. 

Theorem 12.6 Let M be a metric space. 

1) Any complete subspace of M is closed. 

2) If M is complete then a subspace S of M is complete if and 
only if it is closed. 

Proof. To prove (1), assume that S is a complete subspace of M. Let 
(XjJ be a sequence in S for which (x^J— ^x G M. Then (x^^) is a 
Cauchy sequence in S, and since S is complete, (x^J must converge 
to an element of S. Since limits of sequences are unique, we have 
X G S. Hence, S is closed. 

To prove part (2), first ctssume that S is complete. Then part 
(1) shows that S is closed. Conversely, suppose that S is closed, and 
let (XjJ be a Cauchy sequence in S. Since (x^J is also a Cauchy 
sequence in the complete space M, it must converge to some x G M. 
But since S is closed, we have (x^^)— ^x G S. Hence, S is complete. | 

Now let us consider some examples of complete (and incomplete) 
metric spaces. 

Example 12.9 It is well-known that the metric space R is complete. 
(However, a proof of this fact would lead us outside the scope of this 
book.) Similarly, the complex numbers C are complete. D 

Example 12.10 The Euclidean space R^^ and the unitary space 
are complete. Let us prove this for R^\ Suppose that (xj^) is a 
Cauchy sequence in R^, where 

Thus, 

n 

= IZ (Xk.i - as k,m-^oo 

i=l 

and so, for each coordinate position i, 

(Xk,i-X„,,i)^< 0 
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which shows that the sequence (xj^^ i)k=i 2 coordinates is a 

Cauchy sequence in R. Since R is complete, we must have 

(^k,i)^yi k -^00 

If y = (yi,---,yn). then 

n 

<Xjc>y)^= X](xk,i-yi)^-^0 as k-4oo 
i=l 

and so (Xj^)-^y E R^- This proves that is complete. D 

Example 12.11 The metric space (C[a,b],(/) of all real- valued (or 
complex- valued) continuous functions on [a,b], with metric 

rf(f,g)= sup |f(x)-g(x)| 

X € [a,b] 

is complete. To see this, we first observe that the limit with respect to 
d is the uniform limit on [a,b], that is if and only if for any 

e > 0, there is an N > 0 for which 

n > N ==> I fj^(x) — f(x) I < e for all x G [a,b] 

Now, let (fj^) be a Cauchy sequence in (C[a,b],rf). Thus, for any 
€ > 0, there is an N for which 

(12.1) m,n > N => | f^^(x) --fni(x) | < € for all x G [a,b] 

This implies that, for each x G [a,b], the sequence (fj^(x)) is a Cauchy 
sequence of real (or complex) numbers, and so it converges. We can 
therefore define a function f on [a,b] by 

f(x) = lim f (x) 

Letting m—^oo in (12.1), we get 

n > N I fj^(x) — f(x) I < e for all x G [a,b] 

Thus, f^(x) converges to f(x) uniformly. It is well-known that the 
uniform limit of continuous functions is continuous, and so 
f(x)GC[a,b]. Thus, (fj^(x))~^f(x) G C[a,b], and so (C[a,b],rf) is 
complete. D 

Example 12.12 The metric space (C[a,b],dj) of all real-valued (or 
complex- valued) continuous functions on [a,b], with metric 

di(f(x),g(x)) = |f(x)-g(x)| dx 

a 

is not complete. For convenience, we take [a,b] = [0,1] and leave the 
general case for the reader. Consider the sequence of functions fj^(x) 
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whose graphs are shown in Figure 12.3. (The definition of fj^(x) 
should be clear from the graph.) 




We leave it to the reader to show that the sequence (f,^(x)) is Cauchy, 
but does not converge in (C[0,l],rf|). D 

Example 12.13 The metric space is complete. To see this, suppose 
that (x^^) is a Cauchy sequence in where 

Xn = Kvl.X„2>-") 

Then, for each coordinate position i, we have 

(12.2) I - Xj„,i I < sup I x„j - j I ^0 as n,m^oo 

J 

Hence, for each i, the sequence i)n=l 2 . . . ®f l^h coordinates is a 
Cauchy sequence in R (or C). Since R (or C) is complete, we have 

(x„,i)->yi as n-»oo 

for each coordinate position i = 1,2,.... We want to show that y = 
(yi) € and that (:^)-^y. 

Letting m — >00 in (12.2) gives 

(12.3) sup |Xj^j-yj|-^0 as n-^oo 

j 

and so, for some n, 

< 1 j 

and so 

| yjl < 1 + Kijl for all j 

But since it is a bounded sequence, and therefore so is (yj). 

That is, y = (yj) G Since (12.3) implies that (x^J^y, we see that 
is complete. D 

Example 12.14 The metric space 0^ is complete. To prove this, let 
(Xj^) be a Cauchy sequence in (P, where 
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Then, for each coordinate position i, 

oo 

I - Xm,i I < S I ""nj - I » 

j = l 

which shows that the sequence \)n=i 2 . . . coordinates is a 

Cauchy sequence in R (or C). Since R (or C) is complete, we have 

(Xn,i)^yi as n-4oo 

We want to show that y = (y^) G and that (Xj^)-^y. 

To this end, observe that for any e > 0, there is an N for which 

r 

n,m > N ^1 i - x^,i I P < € 
i=l 

for all r > 0. Now, we let m— ^oo, to get 

n>N S I 
i=l 

for all r > 0. Letting r-^oo, we get, for any n > N, 

oo 

i=l 

which implies that (x^^) - y e and so y = y - (x^J + (x^^) E (P , and 
in addition, (x^J-^y. D 

As we will see in the next chapter, the property of completeness 
plays a major role in the theory of inner product spaces. Inner product 
spaces for which the induced metric space is complete are called Hilbert 
spaces. 



Isometries 

A function between two metric spaces that preserves distance is 
called an isometry. Here is the formal definition. 

Definition Let (M,rf) and be metric spaces. A function 

f:M-^M' is called an isometry if 

</'(f(x),f(y)) = d{x,y) 

for all x,y G M. If f:M— is a bijective isometry from M to M', 
we say that M and M' are isometric and write M « M'. I 

Theorem 12.7 Let f:(M,rf)-^(M',rfO isometry. Then 

1) f is injective 

2) f is continuous 
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3) is also an isometry, and hence also continuous. 

Proof. To prove (1), we observe that 

f(x) = f(y) <4^ rf'(f(x),f(y)) = 0 rf(x,y) = 0 x = y 
To prove (2), let (x^J— »x in M, then 

rf'(f(xj,f(x)) = rf(x^,x)-^0 as n^oo 

and so (f(XjJ)-^f(x), which proves that f is continuous. Finally, we 
have 

dir\f(x))rMy)) = d{x,y) = rf'(f(x),f(y)) 

and so is an isometry. I 



The Completion of a Metric Space 

While not all metric spaces are complete, any metric space can be 
embedded in a complete metric space. To be more specific, we have the 
following important theorem. 

Theorem 12.8 Let (M,c?) be any metric space. Then there is a 
complete metric space (M',rf') and an isometry r:M--^r(M) C M' for 
which r(M) is dense in M'. The metric space (M',rf') is called a 
completion of (M,rf). Moreover, (M',rf') is unique, up to bijective 
isometry. 

Proof. The proof is a bit lengthy, so we divide it into various parts. 
We can simplify the notation considerably by thinking of sequences 
(XjJ in M as functions f;N— ^M, where f(n) = x^^. 

Cauchy Sequences in M 

The basic idea is to let the elements of M' be equivalence classes 
of Cauchy sequences in M. So let CS(M) denote the set of all Cauchy 
sequences in M. If f,g G CS(M) then, intuitively speaking, the terms 
f(n) get closer together as n-^oo, and so do the terms g(n). Therefore, 
it seems reasonable that rf(f(n),g(n)) should approach a finite limit as 
n-^oo. Indeed, according to Exercise 2, 

I <f(n),g(n)) - </(f(m),g(m)) | < </(f(n),f(m)) + rf(g(n),g(m)) ^ 0 

as n,m— >oo, and so rf(f(n),g(n)) is a Cauchy sequence of real numbers, 
which implies that 

(12.4) ^lim^ rf(f(n),g(n)) < oo 

(That is, the limit exists and is finite.) 
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Equivalence Classes of Cauchy Sequences in M 

We would like to define a metric d! on the set CS(M) by 

However, it is possible that 

for distinct sequences f and g, so this does not define a metric. Thus, 
we are lead to define an equivalence relation on CS(M) by 

f ~ g = 0 

Let CS(M) be the set of a ll equivalence classes of Cauchy sequences, 
and define, for f , g G CS(M) 

(12.5) d'ilg) rf(f(n),g(n)) 

where f G f and g G g. 

To see that d' is well-defined, suppose that f G f and g' G g. 
Then since f ~ f and g' ^ g, we have 

I <^(f'(n)-g'(n)) - rf(f(n),g(n)) | < rf(f (n),f(n)) + <l(g'(n),g(n)) — 0 
as n— ^oo. Thus, 

f ~ f and g' ~ g ^ d(f (n),g'(n)) = d(f(n),g(n)) 

i{P,g') = d'{f,g) 

which shows that d' is well-defined. To see that d' is a metric, we 
verify the triangle inequality, leaving the rest to the reader. If f,g and 
h are Cauchy sequences, then 

d(f(n),g(n)) < d(f(n),h(n)) + d(h(n),g(n)) 

Taking limits gives 

and so __ _ 

d'ilg) <d'{ih)+d%g) 

Embedding (M,rf) in (M\d') 

For each x G M, consider the constant Cauchy sequence [x], 
where [x](n) = x for all n. The map r:M— defined by 

r(x) = [x] 
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is an isometry, since 

<^'(^W,r(y)) = </(M(n),[y](n)) = d{x,y) 

Moreover, r(M) is dense in M'. This follows from the fact that we 
can approximate any Cauchy sequence in M by a constant sequence. 
In particular, let f G M'. Since f G f is a Cauchy sequence, for any 
e > 0, there exists an N such that 

n,m > N => rf(f(n),f(m)) < e 

Now, for the constant sequence [f(N)] we have 

= <i(f(N),f(n))<€ 

and so r(M) is dense in M'. 

is Complete 

Suppose that 

^ 1 ? ^ 2 ’ ^ 3 ’ ••• 

is a Cauchy sequence in M'. We wish to find a Cauchy sequence g in 
M for which 

^ 0 as k^oo 

Since fj^ e M', and since r(M) is dense in M', there is a constant 
sequence [cj for which 

‘^'(^k>[ck]) = <fk(n).Ck) < ^ 

Let g be the sequence defined by 

g(k) = cj, 

This is a Cauchy sequence in M, since 
d(ck,Cj) = d'([ck].[cj]) 

< d'(K]A) + + d%J^]) < i + d'(fk>^) + J - 0 

as k j— > 00 . To see that converges to g, observe that 

<^'(fk>g) < <^'(fk>[‘^k]) + <^'([ck]>g) < J + <^(ck)g(“)) 

Now, since g is a Cauchy sequence, for any e > 0, there is an N such 
that 
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k,n > N => ^ 

In particular, 

k > N lim d(c]..c^) < e 

— n— ^oo V K’ n/ — 

and so 

k>N =i> <i'(fk,g) < j + e 
which implies that fk~^g? desired. 

Uniqueness 

Finally, we must show that if (M',rf') and (M",rf") are both 
completions of (M,rf), then M' « M". Note that we have bijective 
isometries r:M-^r(M) C M' and ct:M-^<t(M) C M''. Hence, the map 
p — crr'"^:r(M)--^cr(M) is a bijective isometry from r(M) onto <r(M), 
where r(M) is dense in M'. (See Figure 12.4.) 




m' 



M 



Figure 12.4 

Our goal is to show that p can be extended to a bijective isometry 'p 
from M' to M". 

Let X G M'. Then there is a sequence (a^J in r(M) for which 
(a^^)— ^x. Since (a^^) is a Cauchy sequence in r(M), (/>(aj^)) is a 

Cauchy sequence in a{M) C M^', and since M'' is complete, we have 
(p(ajJ)-^y for some y G M". Let us define p(x) = y. 

To see that p is well-defined, suppose that (a^J—^x and 

(bj^)-^x, where both sequences lie in r(M). Then 

0 as n^oo 

and so and (p(bjJ) converge to the same element of M'', 

which implies that p(x) does not depend on the choice of sequence in 
r(M) converging to x. Thus, p is well-defined. Moreover, if 

a G t(M), then the constant sequence [a] converges to a, and so 

p(a) = lim p(a) = p(a), which shows that p is an extension of p. 

To see that ~p is an isometry, suppose that (a^^)— ^x and 
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(b„)-^y. Then {p{aj)-^p{x) and (/>(bj)-+/)(y), and since d” is 
continuous, we have 

d"iX^),p{y)) = „i™b <^'i^nK) = <^'(x,y) 

Thus, we need only show that p is surjective. Note first that 
(t(M) = im{p) C im{'p). Thus, if im{'p) is closed, we can deduce from 
the fact that <t(M) is dense in M'' that im{p) = M''. So, suppose 
that (p(Xj^)) is a sequence in im{p), and (p(XjJ)-^z. Then (p(Xj^)) 
is a Cauchy sequence, and therefore so is (x^^). Thus, {x^)^x G M'. 
But ~p is continuous, and so (p(Xj^))— >p(x), which implies that 

p(x) = z, and so z G im{p). Hence, p is surjective, and M' « M". I 



EXERCISES 

1. Prove the generalized triangle inequality 

4xi,xJ < d{x^,x^) + d(x2,X3) + • • • + d(x„_i,x,J 

2. a) Use the triangle inequality to prove that 

I rf(x,y) - rf(a,b) I < rf(x,a) + rf(y,b) 

b) Prove that 

I d{x,z) - rf(y,z) I < d{x,y) 

3. Let S C be the subspace of all binary sequences (sequences of 
Os and Is). Describe the metric on S. 

4. Let M =: {0,1}^ be the set of all binary n-tuples. Define a 
function h:S x S—^R by letting A(x,y) be the number of positions 
in which x and y differ. For example, A[(11010),(01001)] = 3. 
Prove that A is a metric. (It is called the Hamming distance 
function and plays an important role in coding theory.) 

5. Let 1 < p < oo. 

a) If X = (x^) G show that x^^-^O 

b) Find a sequence that converges to 0 but is not an element of 
any for 1 < p < oo. 

6. a) Show that if x = (x^^) G then x G for all q > p. 

b) Find a sequence x = (x^^) that is in for p > 1, but is not 

in (}, 

7. Show that a subset S of a metric space M is open if and only if 
S contains an open neighborhood of each of its points. 

8. Show that the intersection of any collection of closed sets in a 
metric space is closed. 

Let (M,rf) be a metric space. The diameter of a nonempty 
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11 . 



subset S C M is 

(5(S) = sup rf(x,y) 

x,y GS 



A set S is bounded if 5(S) < oo. 

a) Prove that S is bounded if and only if there is some x G M 
and r G R for which S C B(x,r). 

b) Prove that (5(S) = 0 if and only if S consists of a single 
point. 

c) Prove that S C T implies 6(S) < 5(T). 

d) If S and T is bounded, show that S U T is also bounded. 
Let (M,rf) be a metric space. Let d' be the function defined by 






1 + d(x,y) 



a) Show that (M.rf') is a metric space, and that M is bounded 
under this metric, even if it is not bounded under the 
metric d, 

b) Show that the metric spaces (M,rf) and (M,rf') have the 
same open sets. 

If S and T are subsets of a metric space (M,rf), we define the 
distance between S and T by 



/>(S,T) = mf rf(x,y) 
X G S,t E T 



a) Is it true that p(S,T) = 0 if and only if S = T? Is p a 
metric? 

b) Show that x G c/(S) if and only if p({x},S) == 0. 

12. Prove that x G M is a limit point of S C M if and only if every 

neighborhood of x meets S in a point other than x itself. 

13. Prove that x G M is a limit point of S C M if and only if every 

open ball B(x,r) contains infinitely many points of S. 

14. Prove that limits are unique, that is, (x^J-^x, (Xj^)-^y implies 
that X = y. 

15. Let S be a subset of a metric space M. Prove that x G c/(S) if 
and only if there exists a sequence (x^^) in S that converges to 

X. 

16. Prove that the closure has the following properties. 

a) SCc/(S) b) c/(c/(S)) = S 

c) c/(SUT):=c/(S)Uc/(T) d) c/(SnT)Cc/(S)nc/(T) 

Can the last part be strengthened to equality? 

17. a) Prove that the closed ball B(x,r) is always a closed subset. 

b) Find an example of a metric space in which the closure of an 
open ball B(x,r) is not equal to the closed ball B(x,r). 
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18. Provide the details to show that IR^ is separable. 

19. Prove that is separable. 

20. Prove that a discrete metric space is separable if and only if it is 
countable. 

21. Prove that the metric space ?B[a,b] of all bounded functions on 
[a,b], with metric 

</(f,g) = sup I f(x) - g(x) I 

X 6 [a,b] 

is not separable. 

22. Show that a function f:(M,rf)— is continuous if and only if 

the inverse image, of any open set is open, that is, if and only if 

f“^(U) = {x E M I f(x) E U} is open in M whenever U is an 

open set in M'. 

23. Repeat the previous exercise, replacing the word open by the word 
closed. 

24. Give an example to show that if f:(M,(/)— >(M',rf') is a continuous 
function and U is an open set in M, it need not be the case that 
f(U) is open in M'. 

25. Show that any convergent sequence is a Cauchy sequence. 

26. If (XjJ-^x in a metric space M, show that any subsequence 

(Xj^ ) of (Xj^) also converges to x. 

27. Suppose that (x^^) is a Cauchy sequence in a metric space M, 
and that some subsequence (x^^ ) of (x^^) converges. Prove that 
(XjJ converges to the same limiif as the subsequence. 

28. Prove that if (x^^) is a Cauchy sequence, then the set {x^^} is 
bounded. What about the converse? Is a bounded sequence 
necessarily a Cauchy sequence? 

29. Let (Xj^) and (y^^) be Cauchy sequences in a metric space M. 
Prove that the sequence d^^ = rf(Xj^,yjJ converges. 

30. Show that the space of all convergent sequences of real numbers 
(or complex numbers) is complete as a subspace of 

31. Let denote the metric space of all polynomials over C, with 
metric rf(p,q) = sup | p(x) - q(x) \ . Is ^ complete? 

[a,b] 

32. Let S C be the subspace of all sequences with finite support 
(that is, with a finite number of nonzero terms). Is S complete? 

33. Prove that the metric space Z of all integers, with metric 
rf(n,m) = I n — m I , is complete. 

34. Show that the subspace S of the metric space C[a,b] (under the 
sup metric) consisting of all functions f E C[a,b] for which 
f(a) = f(b) is complete. 

35. If M w M' and M is complete, show that M' is also complete. 

36. Show that the metric spaces C[a,b] and C[c,d], under the sup 
metric, are isometric. 
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37. (Holder’s inequality) Prove Holder’s inequality 

oo / oo \^/P / QQ \l/u 

l^'nynl < E KrT I T 

n=l \n=l / \ii=l / 

as follows. 

a) Show that s = t*^”^ t = 

b) Let u and v be positive real numbers, and consider the 

rectangle R in with corners (0,0), (u,0), (0,v) and 

(u,v), with area uv. Argue geometrically (i.e., draw a picture) 
to show that 



uv < 



and so 



tP”Mt 4- 



0 

uv<^ + ^ 



s*^ ^ds 



c) Now let X = E I Xj J P < cx) and Y = E | y^J ^ < oo. Apply 
the results of part (b), to 



u = ■ 



|ynl 



Xi/p’ 

and then sum on n to deduce Holder’s inequality. 

38. (Minkowski’s inequality) Prove Minkowski’s inequality 

/oo y/P /oo /oo 

'£l^n + ynP) ^lE l^nr) +|£lynl^| 



U=1 



Ul=l 



u=l 



as follows. 

a) Prove it for p = 1 first. 

b) Assume p > 1. Show that 

|x„ + y„|P< |x„| |x„ + y„|P-l+ |y„| |x„ + y„|P"^ 

c) Sum this from n = 1 to k, and apply Holder’s inequality to 
each sum on the right, to get 

E l^'n + ynT 

Divide both sides of this by the last factor on the right, and 
let n-^oo to deduce Minkowski’s inequality. 

39. Prove that £P is a metric space. 
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Now that we have the necessary background on the topological 
properties of metric spaces, we can resume our study of inner product 
spaces without qualification as to dimension. As in Chapter 9, we 
restrict attention to real and complex inner product spaces. Hence F 
will denote either IR or C. 



A Brief Review 

Let us begin by reviewing some of the results from Chapter 9. 
Recall that an inner product space V over F is a vector space V, 
together with an inner product (,):V x V-^F. If F = R, then the inner 
product is bilinear, and if F = C, the inner product is sesquilinear. 

An inner product induces a norm on V, defined by 

II v|| = 

We recall in particular the following properties of the norm. 
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Theorem 13.1 

1) (The Cauchy-Schwarz inequality) For all u,v G V, 

IMI < IMI iivii 

with equality if and only if u = rv for some r G F. 

2) (The triangle inequality) For all u,v G V, 

ll« + v|| < IHI + llvll 

with equality if and only if u = rv for some r G F. 

3) (The parallelogram law) 

||u + v||2+|(u-v||2 = 2||u||2 + 2||v|l2 I 

We have seen that the inner product can be recovered from the 
norm, as follows. 

Theorem 13.2 

1) If V is a real inner product space, then 

(u,.)=l(||«+.||2- ||u-v|p) 

2) If V is a complex inner product space, then 

(«iv) = i(||u + v||2- ||u-v||2)+ii(||u + iv||2'- ||u-iv||2) 

The inner product also induces a metric on V defined by 
d(u,y) = ||u-v|| 

Thus, any inner product space is a metric space. 

Definition Let V and W be inner product spaces, and let 
r G L(V,W). 

1) r is an isometry if it preserves the inner product, that is, if 

{r(u),r(v)) = (u,v) 

for all u,v G V. 

2) A bijective isometry is called an isometric isomorphism. When 
r:V-^W is an isometric isomorphism, we say that V and W 
are isometrically isomorphic. D 

It is easy to see that an isometry is always injective but need not 
be surjective, even if V = W. (See Example 10.3.) 
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Theorem 13.3 A linear transformation r G i.(V,W) is an isometry if 
and only if it preserves the norm, that is, 

lk(v) II = II v|l 

for all V G V. I 

The following result points out one of the main differences 
between real and complex inner product spaces. 

Theorem 13.4 Let V be an inner product space, and let r G i'(V). 

1) If (r(v),w) = 0 for all v, w G V, then r = 0. 

2) If V is a complex inner product space, and (t(v),v) = 0 for all 
V G V, then r = 0. 

3) Part (2) does not hold in general for real inner product spaces. I 

Hilbert Spaces 

Since an inner product space is a metric space, all that we learned 
about metric spaces applies to inner product spaces. In particular, if 
(Xj^) is a sequence of vectors in an inner product space V, then 

(Xj^)^x if and only if || x^^ — x || -^0 as n^oo 

The fact that the inner product is continuous as a function of 
either of its coordinates is extremely useful. 

Theorem 13.5 Let V be an inner product space. Then 

1) (yn)-"y => (Xn>y„)-"(x>y) 

2) KiH* IKJHI|x|| ■ 

Complete inner product spaces play an especially important role in 
both theory and practice. 

Definition An inner product space that is complete under the metric 
induced by the inner product is said to be a Hilbert space. D 

Example 13.1 One of the most important examples of a Hilbert space 
is the space of Example 10.2. Recall that the inner product is 
defined by 

oo 

(x,y) = XI Vn 

n=l 

(In the real case, the conjugate is unnecessary.) The metric induced by 
this inner product is 
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d{x,y) = 



x-y|l2 = 



£ Ki-yn 



n=l 



xl/2 
2 1 



which agrees with the definition of the metric space given in 

Chapter 12. In other words, the metric in Chapter 12 is induced by this 
inner product. As we saw in Chapter 12, this inner product space is 
complete, and so it is a Hilbert space. (In fact, it is the prototype of all 
Hilbert spaces, introduced by David Hilbert in 1912, even before the 
axiomatic definition of Hilbert space was given by Johnny von 
Neumann in 1927.) D 

The previous example raises the question of whether or not the 
other metric spaces (p 2), with distance given by 

(13.1) rf(x,y) = |lx-y||p = ( |Xn-y„l^ 

\n=l 

are complete inner product spaces. The fact is that they are not even 
inner product spaces! More specifically, there is no inner product whose 
induced metric is given by (13.1). To see this, observe that, according 
to Theorem 13.1, any norm that comes from an inner product must 
satisfy the parallelogram law 

II X + y II 2 + II X - y II 2 = 2 II X II 2 + 2 II y II 2 

But the norm in (13.1) does not satisfy this law. To see this, take x = 
(1,1,0...) and y = (1,-1,0. ..). Then 

l|x + y||p = 2, ||x-y||p = 2 

and 

l|x||p = 2'/P I|y||p = 2i/P 

Thus, the left side of the parallelogram law is 8, and the right side is 
4 • 2^'^, which equals 8 if and only if p = 2. 

Just as any metric space hcts a completion, so does any inner 
product space. 

Theorem 13.6 Let V be an inner product space. Then there exists a 
Hilbert space H and an isometry r:V-^H for which r(V) is dense in 
H. Moreover, H is unique up to isometric isomorphism. 

Proof. We know that the metric space (V,rf), where d is induced by 
the inner product, has a unique completion (V',rf'), which consists of 
equiva lence classes of Cauchy sequences in V. If (x^J G (x^^) E V' and 
(yn) ^ (yn) ^ V', then we set 
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(^) + (yn) = (^ + yn)> r(*n) = 
and 

((Xir).(yn)) = 

It is easy to see that, since (x^^) and (y^^) are Cauchy sequences, so 
are (^i + yn) (^^)* addition, these definitions are well- 

defined, that is, they are independent of the choi ce of representative 
from each equivalence class. For instance, if (x^^) G (x^^) then 

ll*rr-*nll = « 

and so 



I - (*n>yn) I = l(Xn - *n^yn) I < IKr “ II II y„ II ^0 

(The Cauchy sequence (y^J is bounded.) Hence, 

( W. W> = ^*n-yn) = iWjM) 

We leave it to the reader to show that V' is an inner product space 
under these operations. 

Moreover, the inner product on V' induces the metric d\ since 
((Xir - yn)>(*ir ~ y„)) = ^1^0 “ yn>*u “ yn) 

= '^'((Xn)>(y„))^ 

Hence, the metric space isometry r:V^V' is an isometry of inner 
product spaces, since 

(T(x),r(y)) = <f'(r(x),r(y))2 = d(x,y)^ = (x,y) 

Thus, V' is a complete inner product space, and r(V) is a dense 
subspace of V' that is isometrically isomorphic to V. We leave the 
issue of uniqueness to the reader. I 

The next result concerns subspaces of inner product spaces. 

Theorem 13.7 

1) Any complete subspace of an inner product space is closed. 

2) A subspace of a Hilbert space is a Hilbert space if and only if it is 
closed. 

3) Any finite dimensional subspace of an inner product space is 
closed and complete. 

Proof. Parts (1) and (2) follow from Theorem 12.6. Let us prove that 
a finite dimensional subspace S of an inner product space V is 
closed. Suppose that (x^^) is a sequence in S, (x^^)-^x, and x ^ S. 
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Let S = be an orthonormal Hamel basis for S. The 

Fourier expansion 

m 

s= 

i=l 

in S has the property that x — s ^ 0 but 

(x-8,bj) = (x,bj)-(8,bj) = 0 

Thus, if we write y = x - s and = x^^ - s € S, the sequence (y^^), 

which is in S, converges to a vector y that is orthogonal to S. But 
this is impossible, because y^^ X y implies that 

llyn-yll^= llynll^+ lly||^> l|y||^ 

This proves that S is closed. 

To see that any finite dimensional subspace S of an inner 
product space is complete, let us embed S (as an inner product space 
in its own right) in its completion S'. Then S (or rather an isometric 
copy of S) is a finite dimensional subspace of a complete inner product 
space S', and as such it is closed. However, S is dense in S' and so 
S = S', which shows that S is complete. I 



Infinite Series 

Since an inner product space allows both addition of vectors and 
convergence of sequences, we can define the concept of infinite sums, or 
infinite series. 

Definition Let V be an inner product space, and let (xj^) be a 
sequence in V. The nth partial sum of the sequence is s^^ = 

Xj H h Xj^. If the sequence (s^^) of partial sums converges to a 

vector s G V, that is, if 

II — s II — ^ 0 as n— ^oo 

then we say that the series converges to s, and write 

oo 

n=l 

We can also define absolute convergence. 

Definition A series Exj^ is said to be absolutely convergent if the 
series oo 

Ekii 

n=l 



converges. D 
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The key relationship between convergence and absolute 
convergence is given in the next theorem. Note that completeness is 
required to guarantee that absolute convergence implies convergence. 



Theorem 13.8 Let V be an inner product space. Then V is complete 
if and only if absolute convergence of a series implies convergence. 

Proof. Suppose that V is complete, and that E || || < oo. Then the 

sequence of partial sums is a Cauchy sequence, for if n > m, we 
have 

l|s„-vll = II E^kll<E ll^klHo 

k=m+l k=m+l 

Hence, the sequence (s^^) converges, that is, the series Exj^ converges. 

Conversely, suppose that absolute convergence implies 
convergence, and let (x^^) be a Cauchy sequence in V. We wish to 
show that this sequence converges. Since (x^^) is a Cauchy sequence, 
for each k > 0, there exists an Nj^ with the property that 

iJ>Nk llxj-Xjll 



Clearly, we can choose < N 2 < • • •, in which case 

1 






k+1 



II <■ 



and so 



00 

II ^Nk_i_i ^Ni II — Z— / 9k 
k=l ^ k=l ^ 



00 ^ 

sEA 



< OO 



Thus, according to hypothesis, the series 






k=l 



converges. But this is a telescoping series, whose nth partial sum is 



and so the subsequence (xjyj ) converges. Since any Cauchy sequence 
that has a convergent subsequence must itself converge, the sequence 
(xj^) converges, and so V is complete. I 



An Approximation Problem 

Suppose that V is an inner product space, and that S is a 
subset of V. It is of considerable interest to be able to find, for any 
X G V, a vector in S that is closest to x in the metric induced by the 
inner product, should such a vector exist. This is the approximation 
problem for V. 
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Suppose that x € V, and let 

8 € S 

Then there is a sequence for which 

^n= 

as shown in Figure 13.1. 




Figure 13.1 



Let us see what we can learn about this sequence. First, if we let yj^ = 
X — Sj^, then according to the parallelogram law 

llyk+jjll"+llyk-yill" = 2(llykll"+ll»ilP) 

or 

(13.2) ||yk-yjll' = 2(||ykll'+ l|yjll')-4||^^||2 

Now, if the set S is convex, that is, if 

x,y GS => rx+(l — r)y G S for all 0 < r < 1 

(in words S contains the line segment between any two of its points) 
then + Sj)/2 G S and so 

11^^11 = II y-i? II >< 

Thus, (13.2) gives 

II yic - yj II ^ < 2( II yk II ^ + II yj II - 4^^^ o 



as kj — >oo. Hence, if S is convex, then (y^^) = (x — s^^) is a Cauchy 
sequence, and therefore so is (s^^). 

If we also require that S be complete, then the Cauchy sequence 
(Sj^) converges to a vector s G S, and by the continuity of the norm, 
we must have ||x — s|| = 6 . Let us summarize and add a remark 
about uniqueness. 
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Theorem 13.9 Let V be an inner product space, and let S be a 
complete convex subset of V. Then for any x G V, there exists a 
unique s G S for which 

||x-s|| = inf ||x-sll 
s G S 

The vector s is called the best approximation to x in S. 

Proof. Only the uniqueness remains to be established. Suppose that 
||x-s|| = 6 = ||x-s'l| 

Then, by the parallelogram law, 

11 8 -s' II 2= ||(x-s')-(x-s) ||2 

< 2 II X- 8 II ^ + 2 II X- s' II ^ - II 2x- 8 - s' II ^ 

= 2||x-8||2 + 2||x-8'll2-4||x-i^||2 

< 2^2 + 2^2 - 45 ^ = 0 



and so s = s'. I 

Since any subspace S of an inner product space V is convex. 
Theorem 13.9 applies to complete subspaces. However, in this case, we 
can say more. 

Theorem 13.10 Let V be an inner product space, and let S be a 
complete subspace of V. Then for any x G V, the best approximation 
to x in S is the unique vector s' G S for which x — s' X S. 

Proof. Suppose that x-s' X S, where s' G S. Then for any s G S, we 
have X — s' X s — s' and so 

||x-s||2= Ilx-s'||2+ ||g-_s||2> ||x-s'||2 

Hence s' = s is the best approximation to x in S. Now we need only 
show that X — s X S, where s is the best approximation to x in S. 
For any s G S, a little computation reminiscent of completing the 
square gives 

II X - rs II ^ = (x - rs,x - rs) 

= 11 X II ^ - f(x,s) - r(8,x) + rf II 8 II 2 

_ 11^112 , II „ II 2 X . - M ^ 



2 
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= X r+ 8 r r- 



2L (x,s) Y (x,s) \ I (x,s) I ■ 



r — 



= llx||2+ ||8||2 



r — ■ 



(X,S) I 2 I (X,8) I 



Now, the last expression is smallest when 

(x,s) 



in which case 






Replacing x by x — s gives 



X-S -TnS ^ < X-S r- 



H»l 



2 |(x-s,s)|^ ^ ^ _ I(x-s ,s)[ 



But s + TqS G S, and so the left side must be at least (5, implying that 

|{x-s,s)|2_ 

II II — u 



or, equivalently. 



Hence, x — s±S. I 



(x — s ,s) = 0 



According to Theorem 13.10, if S is a complete subspace of an 
inner product space V, then for any x G V, we may write 

X = S "f (x — s) 

where s G S and x — s G S*^. Hence, V = S + S'*", and since S fl S'*’ = 
{ 0 }, we also have V = SQS'*’. This is the projection theorem for 
arbitrary inner product spaces. 



Theorem 13.11 (The projection theorem) If S is a complete subspace 
of an inner product space V, then 

V = SQS-" 

In particular, if S is a closed subspace of a Hilbert space H, then 

H = SqS-^ I 

Theorem 13.12 Let S, T and T' be subspaces of an inner product 
space V. 
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1) If V = S®T, then T = S-". 

2) If S®T = S®T', then T = T'. 

Proof. If V = S ® T then T C S'*" by definition of orthogonal direct 
sum. On the other hand, if z G S'*’, then z = s + 1, for some s G S 
and t G T. Hence, 

0 = (Z,S) == (S,S) + (t,s) =: (S,S) 

and so s = 0, implying that z — t G T. Thus, S'*" C T. Part (2) 
follows from part (1). I 

Let us denote the closure of the span of a set S of vectors by 
cspan(S). 

Theorem 13.13 Let H be a Hilbert space. 

1) If A is a subset of H, then 

cspan(A) = A'*"*’ 

2) If S is a subspace of H, then 

c/(S) = 

3) If K is a closed subspace of H, then 

Proof. We leave it as an exercise to show that [cspan{A)]^ = A"*". 
Hence 

H = C5^a7i(A) Q [c5pan(A)]'*’ = cspan(A) Q A^ 

But since A'*’ is closed, we also have 

H = A-^®A-^-^ 

and so by Theorem 13.12, cspan{A) = A'*"*'. The rest follows easily 
from part (1). I 

In the exercises, we provide an example of a closed subspace K of 
an inner product space V for which K ^ K'*"*“. Hence, we cannot drop 
the requirement that H be a Hilbert space in Theorem 13.13. 

Corollary 13.14 If A is a subset of a Hilbert space H then span{A) 
is dense in H if and only if A"*" = {0}. 

Proof. As in the previous proof, 

H = cspan{A) Q A^ 

and so A"*" = {0} if and only if H = cspan(A), I 
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Hilbert Bases 

We recall the following definition from Chapter 9. 

Definition A maximal orthonormal set in a Hilbert space H is called a 
Hilbert basis for H. D 

Zorn’s lemma can be used to show that any nontrivial Hilbert 
space has a Hilbert basis. Again, we should mention that the concepts 
of Hilbert basis and Hamel basis (a maximal linearly independent set) 
are quite different. We will show later in this chapter that any two 
Hilbert bases for a Hilbert space have the same dimension. 

Since an orthonormal set O is maximal if and only if = {0}, 
Corollary 13.14 gives the following characterization of Hilbert bases. 

Theorem 13.15 Let O be an orthonormal subset of a Hilbert space H. 
The following are equivalent. 

1) O is a Hilbert basis 

2) O-^ = {0} 

3) O is a total subset of H, that is, cspan{0) = H. I 

Part (3) of this theorem says that a subset of a Hilbert space is a 
Hilbert basis if and only if it is a total orthonormal set. 



Fourier Expansions 

We now want to take a closer look at best approximations. Our 
goal is to find an explicit expression for the best approximation to any 
vector X from within a closed subspace S of a Hilbert space H. We 
will find it convenient to consider three cases, depending on whether S 
hcis finite, countably infinite, or uncountable dimension. 



The Finite Dimensional Case 

Suppose that O = {u^,...,Uj^} is an orthonormal set in a Hilbert 
space H. Recall that the Fourier expansion of any x G H, with respect 
to O, is given by 

n 

X= ^(x,Uj,)Uk 

k=l 

where (x,Uj^) is the Fourier coefficient of x with respect to Uj^. 
Observe that 

(x - X,U^) = (x,Uj^) - (x,Uk) = 0 

and so x — x ± 5pa?i(0). Thus, according to Theorem 13.10, the 
Fourier expansion x is the best approximation to x in span(0). 
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Moreover, since x — x ± x, we have 




and so 



< 



2 



with equality if and only if x = x, which happens if and only if 
X G span{0). Let us summarize. 



Theorem 13.16 Let O = be a finite orthonormal set in a 

Hilbert space H. For any xGH, the Fourier expansion x of x is 
the best approximation to x in span{0). We also have Bessel’s 
inequality 

l|x|| < llxll 



or, equivalently 
(13.3) 



J|(x,Uk)|2< ||x||2 

k=l 



with equality if and only if span(0), I 



The Countably Infinite Dimensional Case 

In the countably infinite case, we will be dealing with infinite 
sums, and so questions of convergence will arise. Thus, we begin with 
the following. 



Theorem 13.17 Let O = {uj,U 2 ,. be a countably infinite 
orthonormal set in a Hilbert space H. The series 

oo 

(13-4) 

k=l 

converges in H if and only if the series 



oo 

( 13 . 5 ) 

k=l 

converges in R. If these series converge, then they converge 
unconditionally (that is, any series formed by rearranging the order of 
the terms also converges). Finally, if the series (13.4) converges then 



E' 

k=l 



El 

k=l 



Proof. Denote the partial sums of the first series by and the partial 
sums of the second series by Then for m < n 
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K-8mll^= 

k=m+l 

Hence (s^^) is a Cauchy sequence in H if and only if (p^^) is a 
Cauchy sequence in R, Since both H and R are complete, (s^^) 
converges if and only if (p^J converges. 

If the series (13.5) converges, then it converges absolutely, and 
hence unconditionally. (A real series converges unconditionally if and 
only if it converges absolutely.) But if (13.5) converges unconditionally, 
then so does (13.4). The last part of the theorem follows from the 
continuity of the norm. I 

Now let O = {uj,U 2 ,. ..} be a countably infinite orthonormal set 
in H. The Fourier expansion of a vector x E H is defined to be the 
sum 

oo 

(13.6) x=^(x,Uk)uk 

k=l 

To see that this sum converges, observe that, for any n > 0, (13.3) 
gives 

Ei(»,"k)i"< iixii" 

k=l 

and so oo 

£|(x,u^)|2< ||x||2 
k=l 

which shows that the series on the left converges. Hence, according to 
Theorem 13.17, the Fourier expansion (13.6) converges unconditionally. 
Moreover, since the inner product is continuous, 

(x - X,Uj^) = (x,Uj^) - (x,u^) = 0 

and so x — x£[span{0)Y' = [cspan{0)y‘, Hence, x is the best 
approximation to x in cspan{0). Finally, since x — x±x, we again 
have 

ll*ll"= l|x||"- ||X-*||"< ||x||2 

and so 

l|x|| < ||x|| 

with equality if and only if x = x, which happens if and only if 
X E cspan{0). Thus, the following analog of Theorem 13.16 holds. 

Theorem 13.18 Let O = {uj,U 2 ,. . .} be a countably infinite 
orthonormal set in a Hilbert space H. For any xEH, the Fourier 
expansion 

oo 

x= 5^{x,u^)uk 
k=l 



= £ l'’kl^= |Pn-Pml 
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of X converges unconditionally and is the best approximation to x in 
cspan{0). We also have Bessel’s inequality 

l|x|| < 11x11 

or, equivalently 

oo 

k=l 

with equality if and only if x G cspan{0), I 



The Arbitrary Case 

To discuss the case of an arbitrary orthonormal set O = 
{uj^ I k G K}, let us first define and discuss the concept of the sum of an 
arbitrary number of terms. (This is a bit of a digression, since we could 
proceed without all of the coming details — but they are interesting.) 



Definition Let 3G = {xj^ 1 k G K} be an arbitrary family of vectors in 
an inner product space V. The sum E Xj^ is said to converge to a 
vector X G V, and we write ^ 



(13.7) 



= E 



keK 



if for any e > 0, there exists a finite set S C K for which 
T D S, T finite 

kGT 



For those readers familiar with the language of convergence of 
nets, the set ^q(^) finite subsets of K is a directed set under 

inclusion, and the function 

k€S 

is a net in H. Convergence of (13.7) is convergence of this net. In any 
case, we will refer to the preceding definition as the net definition of 
convergence. 

It is not hard to verify the following basic properties of net 
convergence for arbitrary sums. 

Theorem 13.19 Let 9G = {xj^ | k G K} be an arbitrary family of vectors 
in an inner product space V. If 

^ = X and = y 

keK keK 



then 
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1 ) 

2 ) 

3 ) 



2 rxi^ = rx for any r G F 
kGK 

E (*k+yk) = *+y 

keK 

E (xk»y) = (*>y) E (y)*k) = (y>*) ■ 

K K 



The next result gives a useful description of convergence, which 
does not require explicit mention of the sum. 

Theorem 13.20 Let 9G = {xj^ | k G K} be an arbitrary family of vectors 
in an inner product space V. 

1) If the sum 

kGK 

converges, then for any e > 0, there exists a finite set I C K such 
that 

J n I = 0, J finite I ^2 ^k II ^ ^ 

kGJ 

2) If V is a Hilbert space, then the converse of (1) also holds. 

Proof. For part (1), given e > 0, let S C K, S finite, be such that 

T D S, T finite | ^ | 

If J n S = 0, J finite, then k g T 

II E^kll = ll(Exk+ Exk-x)-(Exk-x) II 

J J S S 

< II E Xk-*ll + II E^k-^ll <|+f = f 

As for part (2), for each n > 0, let C K be a finite set for 
which 

J n Ijj = 0, J finite I ^ xj I < i 
and let j € J 

yn = S 

Then (y^^) is a Cauchy sequence, since 

llyn-ymll = II Exk-Exkll = II E Xk- E Xkll 
< II E xj^ll + II E_ Xkll <M + 5-*0 
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Since V is assumed complete, we have (yn)“^y* 

Now, given e > 0, there exists an N such that 

n>N ||y„-y|| = 11 Exk-y|l <| 

Setting n = max{N,2/e} gives 
T D T finite 

II E^k-yll = II E*k-y+ E *kll 

T I„ T-I„ 

< II E*k-yll + II E Xkll <|+n<f 

In T-I„ ^ 

and so ^ Xi^ converges to y. I 

keK 



The following theorem tells us that convergence of an arbitrary 
sum implies something very special about the terms. 

Theorem 13.21 Let 3G = {xj^ | k G K} be an arbitrary family of vectors 
in an inner product space V. If the sum 

k€K 

converges, then at most a countable number of terms Xj^ can be 
nonzero. 

Proof. According to Theorem 13.20, for each n > 0, we can let C K, 
finite, be such that 

J = 0, J finite | ^Xj I < ^ 

j € J 

Let I = IJIn* Then I is countable, and 
n 

k ^ I => {k} n = 0 for all n II *k II all n => Xj^ = 0 I 

Here is the analog of Theorem 13.17. 




Theorem 13.22 Let O = {u|^ | k G K) be an arbitrary orthonormal 
family of vectors in a Hilbert space H. The two series 

and |rkl^ 

k€K k€K 

converge or diverge together. If these series converge then 



ke 



= Z) I r 



k^K 
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Proof. The first series converges if and only if for any 
exists a finite set I C K such that 



Jni = 0 , J 

or, equivalently 



finite 

kG J 




e > 0, there 



J n I = 0, J finite => ^2 I I ^ ^ 
ke J 

and this is precisely what it means for the second series to converge. 
We leave proof of the remaining statement to the reader. I 



The following is a useful characterization of arbitrary sums of 
nonnegative real terms. 

Theorem 13.23 Let {rj^ | k G K} be a collection of nonnegative real 
numbers. Then 



(13.8) 



E"k= sup E^’k 

kGK kGJ 



provided that either of the preceding expressions are finite. 

Proof. Suppose that 

sup E^k = R < oo 

Yck® keJ 

Then, for any e > 0, there exists a finite set S C K such that 

R > ^2 ^*k — ^ ^ 

kes 

Hence, if T C K is a finite set for which T D S, then since rj^ > 0, 

E^k> E'k>R-f 

keT k€S 

and so 



R-E 



kGT 



<e 



which shows that converges to R. Finally, if the sum on the 

left of (13.8) converges, then the supremum on the right is finite, and so 
(13.8) holds. I 



The reader may have noticed that we have two definitions of 
convergence for countably infinite series — the net version and the 
traditional version involving the limit of partial sums. Let us write 

oo 

E *k and 

k e N* k=l 
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for the net version and the partial sum version, respectively. Here is 
the relationship between these two definitions. 

Theorem 13.24 Let H be a Hilbert space. If Xj^ E H for all k, then 
the following are equivalent. 

1) converges (net version) to x 
oo 

2) ^k converges unconditionally to x 

k=l 

Proof. Assume that (1) holds. Suppose that tt is any permutation of 
N+. Given any e > 0, there is a finite set S C N'^ for which 

T D S, T finite ^ 

keT 

Let us denote the set of integers n} by and choose a 

positive integer n so that Tr(Ij^) D S. Then 

m > n => D 7 t(I J D S 

* || £».(k)-*|| = II S >’k-*ll 

k=l k 6 7 t(I J 

and so (2) holds. 

Next, assume that (2) holds, but that the series in (1) does not 
converge. Then there exists an e > 0 such that, for any finite subset 
I C there exists a finite subset J with J f| I = 0 for which 




From this, we deduce the existence of a countably infinite sequence 
of mutually disjoint finite subsets of with the property that 

max(J„) = = min(J„+l) 

and 

S *k > f 

k6J„ 

Now, we choose any permutation TriN"^— >N'*’ with the following 

properties 

1) 7r([mj^,Mj) C 

2) if = 

^(^n) jn,l’ “ Jn,2’ • • • ’ 

The intention in property (2) is that, for each n, tt takes a set of 
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consecutive integers to the integers in 

For any such permutation x, we have 



X] *7T(k) = X] ^ ^ 

k=m^ k e J„ 

which shows that the sequence of partial sums of the series 

CO 

X^*5T(k) 

k=l 

is not Cauchy, and so this series does not converge. This contradicts 
(2), and shows that (2) implies at least that (1) converges. But if (1) 
converges to y G H, then since (1) implies (2), and since unconditional 
limits are unique, we have y = x. Hence, (2) implies (1). | 

Now we can return to a discussion of Fourier expansions. Let 
O = {uj^ I k G K} be an arbitrary orthonormal set in a Hilbert space H. 
Given any xG H, we may apply Theorem 13.16 to all finite subsets of 
O, to deduce that 

keJ 

and so Theorem 13.23 tells us that the sum 

Z) l(*>"k)l^ 

k6K 

converges. Hence, according to Theorem 13.22, the Fourier expansion 

X = Z (x,Uk)“k 

of X also converges, and ^ 

k€K 

Note that, according to Theorem 13.21, x is a countably infinite sum 
of terms of the form (x,Uj^}uj^, and so is in cspan{0). 

In view of part (3) of Theorem 13.19, we have 

(x - X,Uj^) = (x,Uj^) - (x,u^) = 0 

and so x-x G [span(O)]'*’ = [cspan(O)]'''. Hence, x is the best 
approximation to x in cspan{0). Finally, since x-x±x, we again 
have 



x-xr < X 



and so 



X < X 
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with equality if and only if x = x, which happens if and only if 
n£cspan{0). Thus, we arrive at the most general form of a key 
theorem about Hilbert spaces. 



Theorem 13.25 Let O = | k G K} be an orthonormal family of 

vectors in a Hilbert space H. For any x G H, the Fourier expansion 



keK 

of X converges in H, and is the unique best approximation to x in 
cspan{0). Moreover, we have Bessel’s inequality 



or, equivalently 



11*11 < 11*11 



E i(*.“k)i"< 11*11" 

keK 



with equality if and only if x G cspan{0). I 



A Characterization of Hilbert Bases 

Recall from Theorem 13.15 that an orthonormal set O = 
{uj^ I k G K} in a Hilbert space H is a Hilbert basis if and only if 

cspan{0) = H 

Theorem 13.25 then leads to the following characterization of Hilbert 
bases. 

Theorem 13.26 Let O = {uj^ | k G K} be an orthonormal family in a 
Hilbert space H. The following are equivalent. 

1) O is a Hilbert basis (a maximal orthonormal set) 

2) O-^ = {0} 

3) O is total (that is, cspan{0) — H) 

4) X = X for all x G H 

5) Equality holds in Bessel’s inequality for all x G H, that is, 

11x11 = 115c|| 

for all X G H. 

6) Parse val’s identity 

(x,y) = (x,y) 

holds for all x,y G H, that is, 

(x>y) = (*’“k)(y»«k) 

kGK 

Proof. Parts (1), (2) and (3) are equivalent by Theorem 13.15. Part 
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(4) implies part (3), since x G cspan{0)^ and (3) implies (4) since the 
unique best approximation of any x G cspan(0) is itself, and so x = x. 
Parts (3) and (5) are equivalent by Theorem 13.25. Parseval’s identity 
follows from part (4) by part (3) of Theorem 13.19. Finally, Parseval’s 
identity for y = x implies that equality holds in Bessel’s inequality. I 



Hilbert Dimension 

We now wish to show that all Hilbert bases for a Hilbert space H 
have the same cardinality, and so we can define the Hilbert dimension 
of H to be that cardinality. 

Theorem 13.27 All Hilbert bases for a Hilbert space H have the same 
cardinality. This cardinality is called the Hilbert dimension of H. We 
will denote the Hilbert dimension of H by hdim(R), 

Proof. If H has a finite Hilbert basis, then that set is also a Hamel 
basis, and so all Hilbert bases have size rfim(H). Suppose next that 
^ I ^ ^ ^ — {cj I j ^ J} ^re infinite Hilbert bases for H. 

Then for each bj^, we have 

i>k = E 

j€Jk 

where is the countable set {j | (bj^,Cj) 0}. Moreover, since no Cj 
can be orthogonal to every bj^, we have lJJj^ = J. Thus, since each 
is countable. Theorem 0.16 gives ^ 

|J| = I U h\ <«olK| = |K| 

k€K 

By symmetry, we also have | K | < | J | , and so the Schrdder- 
Bernstein theorem implies that | J | = | K | . I 

Theorem 13.28 Two Hilbert spaces are isometrically isomorphic if and 
only if they have the same Hilbert dimension. 

Proof. Suppose that hdwi{E^) = hdim(R 2 )- Let Oj = {uj^ | k G K} be 
a Hilbert basis for H^ and = {^kl ^ ^ Le a Hilbert basis for 
H 2 . We may define a map r:Hj--^H 2 ^ follows 

k€K keK 

We leave it as an exercise to verify that r is a bijective isometry. The 
converse is also left as an exercise. I 
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A Characterization of Hilbert Spaces 

We have seen that any vector space V is isomorphic to a vector 
space (F®)q of all functions from B to F that have finite support. 
There is a corresponding result for Hilbert spaces. Let K be any 
nonempty set, and let 



«2(K)={f:K-*c| If(k)l'<4 

kGK 

The functions in £^(K) are referred to as square summable functions. 
(We can also define a real version of this set by replacing C by R.) 
We define an inner product on f^(K) by 

= S f(k)g(k) 

k6K 

The proof that £^(K) is a Hilbert space is quite similar to the 
proof that = i^{N) is a Hilbert space, and the details are left to the 
reader. If we define G £^(K) by 



^k(j) 




then the collection 



if j = k 
if 



0 = {5k|k6K} 

is a Hilbert basis for ^^(K), of cardinality j K | . To see this, observe 
that 

(Mj) = Ewp) = ^ij 

k€K 

and so O is orthonormal. Moreover, if f E ^^(K), then f(k) ^ 0 for 
only a countable number of k E K, say {k^,k 2 ,...}. If we define f by 

oo 

f'=Ef(klMk. 

i=l 

then f E cspan{0) and f (j) = f(j) for all j E K, which implies that 
f = f . This shows that £^(K) = cspan{0)^ and so O is a total 
orthonormal set, that is, a Hilbert basis for £^(K). 

Now let H be a Hilbert space, with Hilbert basis *35 = 
{uj^ I k E K}. We define a map 0:H-^£^(K) as follows. Since is a 
Hilbert basis, any x E H has the form 

x = E (*."k)«k 

keK 

Since the series on the right converges. Theorem 13.22 implies that the 
series 



keK 
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converges. Hence, another application of Theorem 13.22 implies that 
the following series converges, and so we may set 



keK 

It follows from Theorem 13.19 that <j) is linear, and it is not hard to 
see that it is also bijective. Notice that - and so <!> takes 

the Hilbert basis ^ for H to the Hilbert basis O for £^(K). 

Notice also that 



II II ^ = XI I (x>“k) I ^ 

keK 

and so (p is an isometric isomorphism, 
theorem. 



= X 



= II (x>«k)“k 

"k€K 

We have proved the following 



Theorem 13.29 If H is a Hilbert space of Hilbert dimension k, and if 
K is any set of cardinality k, then H is isometrically isomorphic to 
£^(K). I 



The Riesz Representation Theorem 

We conclude our discussion of Hilbert spaces by discussing the 
Riesz representation theorem. As it happens, not all linear functionals 
on a Hilbert space have the form ‘‘take the inner product with...,” as 
in the finite dimensional case. To see this, observe that if y G H, then 
the function 

fy(x) = (x,y) 



is certainly a linear functional on H. However, it has a special 
property. In particular, the Cauchy-Schwarz inequality gives, for all 
xGH 

|fy(x)| = |{x,y)| < llxll ||y|| 

or, for all x 0, 



Noticing that equality holds if x = y, we have 



sup - 
x^O 



I fy(x) I 

l|x|| 



l|y|| 



This prompts us to make the following definition, which we do for 
linear transformations between Hilbert spaces (this covers the case of 
linear functionals). 
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Definition Let be a linear transformation from to H 2 

Then r is said to be bounded if 



sup - 
x^^tO 



II II 
l|x|| 



< 00 



If the supremum on the left is finite, we denote it by 
the norm of r. D 



and call it 



Of course, if f:H— is a bounded linear functional on H, then 



|f|| =sup- 



|f(x) 



The set of all bounded linear functionals on a Hilbert space H is called 
the continuous dual space, or conjugate space, of H, and denoted by 
H*. Note that this differs from the algebraic dual of H, which is the 
set of all linear functionals on H. In the finite dimensional case, 
however, since all linear functionals are bounded (exercise), the two 
concepts agree. (Unfortunately, there is no universal agreement on the 
notation for the algebraic dual versus the continuous dual. Since we 
will discuss only the continuous dual in this section, no confusion should 
arise.) 

The following theorem gives some simple reformulations of the 
definition of norm. 



Theorem 13.30 Let rrH^— ^H 2 be a bounded linear transformation. 

1) Hr II = sup ||r(x)|| 

II X 11=1 

2) ||r|| = sup ||r(x)|| 

l|x|| < 1 

3) II r II =inf{cGlR| || ^r(x) || <c||x|| for all xGH} I 



The following theorem explains the importance of bounded linear 
transformations. 



Theorem 13.31 Let r:H^— >H 2 be a linear transformation. The 

following are equivalent. 

1) r is bounded 

2) r is continuous at any point Xq G H 

3) r is continuous. 

Proof. Suppose that r is bounded. Then 

II r(x) - r(xo) II = ||r(x-Xo)|| < || r || ||x-Xo||-^0 
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as x-^Xq. Hence, r is continuous at Xq. Thus, (1) implies (2). If (2) 
holds, then for any y G H, we have 



II - ^(y) II = lk(x - y + *o) - ■r(*o) II 0 

as x-^y, since r is continuous at Xq, and x — y + Xg—^XQ as y~^x. 
Hence, r is continuous at any y G H, and (3) holds. Finally, suppose 
that (3) holds. Thus, r is continuous at 0, and so there exists a 
^ > 0 such that 

I|X|| <6 llr(x)|| <1 



In particular. 



=z 6 => 



kWII .1 

11x11 -8 



and so 



= 1 =?► lUx II = 6 ^ 



II 8x II 



<1=^. 



k(x)|| ^1 

llxll -6 



Thus, r is bounded. I 



Now we can state and prove the Riesz representation theorem. 



Theorem 13.32 (The Riesz representation theorem) Let H be a 

Hilbert space. For any bounded linear functional f on H, there is a 
unique Zq G H such that 

f(x) = (x,Z(j} 

for all X G H. Moreover, || Zq || = || f || • 

Proof. If f = 0, we may take Zq = 0, so let us assume that f ^ 0. 
Hence, K = ker{{) ^ H, and since f is continuous, K is closed. Thus 

H = K®K-^ 

Now, the first isomorphism theorem, applied to the linear functional 
f:H—»F, implies that H/K « F ^as vector spaces). In addition. 
Theorem 3.5 implies that H/K « A, and so a » F. In particular, 
dim{K^) = 1. 

For any z G K”*”, we have 

X G K => f(x) = 0 = (x,z) 

Since rf 2 m(K'*') = 1, all we need do is find a 0 ^ z G K"** for which 

f(z) = (z,z) 

for then f(rz) = rf(z) = r{z,z) = (rz,z) for all r G F, showing that 
f(x) = (x,z) for X G K as well. 
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has this property, as can be easily checked. The fact that || Zq \\ = 
||f|| has already been established. I 



EXERCISES 

1. Prove that the sup metric on the metric space C[a,b] of 

continuous functions on [a,b] does not come from an inner 
product. Hint: let f(t) = 1 and g(t) = (t — a)/(b — a), and 

consider the parallelogram law. 

2. Prove that any Cauchy sequence that has a convergent 
subsequence must itself converge. 

3. Let V be an inner product space, and let A and B be subsets 
of V. Show that 

a) A C B => B-" C A-^ 

b) A"** is a closed subspace of V 

c) [cspan(A)]^ — A"*” 

4. Let V be an inner product space and S C V. Under what 
conditions is S'*’’*’'*' = S'*"? 

5. Prove that a subspace S of a Hilbert space H is closed if and 
only if S = S'*"*". 

6. Let V be the subspace of consisting of all sequences of real 
numbers, with the property that each sequence has only a finite 
number of nonzero terms. Thus, V is an inner product space. 
Let K be the subspace of V consisting of all sequences x = (x^^) 
in V with the property that Sx^^n = 0. Show that K is 
closed, but that K'*’’*’ ^ K. Hint: For the latter, show that K'*’ = 
{0} by considering the sequences u = (l,...,-n,...), where the 
term -n is in the nth coordinate position. 

7. Let O = {u^,U 2 ,...} be an orthonormal set in H. If x= Srj^Uj^ 
converges, show that 

11 * II ^ £ I rk 1 ^ 

k~l 

8. Prove that if an infinite series 




k=l 



converges absolutely in a Hilbert space H, then it also converges 
in the sense of the ‘‘net” definition given in this section. 

9. Let {rj^ I k G K} be a collection of nonnegative real numbers. If 
the sum on the left below converges, show that 

E"k= sup 

keK 
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10. Find a countably infinite sum of real numbers that converges in 
the sense of partial sums, but not in the sense of nets. 

11. Prove that if a Hilbert space H has infinite Hilbert dimension, 
then no Hilbert basis for H is a Hamel basis. 

12. Prove that £^(K) is a Hilbert space for any nonempty set K. 

13. Prove that any linear transformation between finite dimensional 
Hilbert spaces is bounded. 

14. Prove that if f E H*, then ker{i) is a closed subspace of H. 

15. Prove that a Hilbert space is separable if an only if hdim(R) < Kq. 

16. Can a Hilbert space have countably infinite Hamel dimension? 

17. What is the Hamel dimension of £^(N)? 

18. Let T and <t be bounded linear operators on H. Verify the 
following. 

a) ||rr|| = |r| ||r|| 

b) lir + <T|| < Hr II + II <7 II 

c) i|T<^|| < II I'll Ik II _ 

19. Use the Riesz representation theorem to show that H* « H for 
any Hilbert space H. 




CHAPTER 14 

Tensor Products 



Contents: Free Vector Spaces. Another Look at the Direct Sum. 

Bilinear Maps and Tensor Products. Properties of the Tensor Product. 
The Tensor Product of Linear Transformations. Change of Base Field. 
Multilinear Maps and Iterated Tensor Products. Alternating Maps and 
Exterior Products. Exercises. 

In the preceding chapters, we have seen several ways to construct 
new vector spaces from old ones. Two of the most important such 
constructions are the direct sum U 0 V and the set JL(U,V) of all 
linear transformations from U to V. In this chapter, we consider 
another construction, known as the tensor product. 

There are several ways to define the tensor product but, 
unfortunately, they are all a bit less perspicuous than one might like. 
Therefore, in order to provide some motivation, we will first recast the 
definition of the familiar external direct sum. In order to do this (and 
to define tensor products) we need the concept of a free vector space. 



Free Vector Spaces 

Let F be a field. Given any nonempty set X, we may construct 
a vector space over F with X as basis, simply by taking to 
be the set of all formal finite linear combinations of elements of X 

Xj G X, r; e f} 



E 

finite 
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where the operations are as expected — combine like terms using the 
rules 



and 



rx| + SX| = (r -f s)x| 



r(sX|) = (rs)x| 



The vector space called the free vector space on X. The term 

free is meant to connote the fact that there is no relationship between 
the elements of X. 

In fact, any vector space V is the free vector space on any basis 
for V. Thus, in some sense, we have introduced nothing new. 
However, the concept of free object occurs in many other contexts, as 
we have seen with regard to modules, where not all modules are free. 
Moreover, even in the context of vector spaces, it gives us a new 
viewpoint from which to develop new ideas. 

We may characterize the free vector space as the set (F^)q 

of all functions from X to F that have finite support Recall that the 
support of a function f:X—»F is defined by 

supp(F) = {x G X I f(x) 7^: 0} 

It is easy to see that a function f:X— >F with finite support corresponds 
to a finite sum of elements of X, via 

^f(Xi)Xj 

and therefore that the two constructions of equivalent. We 

will feel free to use either construction. 

We can express the concept of freeness in a much more general 
way as follows. Consider the map defined by j(x) = x, and 

called the canonical injection of X into 7^. The pair ^ 

very special property. Referring to Figure 14.1, if fiX—^V is any map 
from X to any vector space V, then there is a unique linear 
transformation r from ^x V for which roj = f. 




V 



Figure 14.1 

For if f:X— ^V, then we can define a linear transformation by 

setting r(x) = f(x) and extending by linearity to ^x* This is legitimate 




14 Tensor Products 



293 



since X is a basis for The uniqueness of r also follows from the 

fact that X is a basis for 

When any two paths in a diagram, such as Figure 14.1, that begin 
and end at the same locations describe equal functions, we say that the 
diagram commutes. Thus, saying that r oj = i is the same as saying 
that the diagram in Figure 14.1 commutes. We can also describe this 
situation by saying that any function f:X— can be factored through 
the canonical injection j. 

Now, it so happens that the commutativity of Figure 14.1, and 
the uniqueness of r, completely determine the pair More 

specifically, we have the following, known as the universal property of 
the free vector space ^x* 

Theorem 14.1 (The Universal Property of Free Vector Spaces) Let 

X be a nonempty set. Suppose that 5 is a vector space over F, and 
k:X^^ is a function, and that the pair (^,^) has the following 
property. Referring to Figure 14.2, for any function f:X— ^V, where V 
is a vector space over F, there exists a unique linear transformation 
for which r o k = that is, for which the diagram in Figure 
14.2 commutes. Then ^ is isomorphic to the free vector space 




V 

Figure 14.2 

Proof. Consider the diagrams in Figure 14.3. The first diagram reflects 
the fact that we may put V = in Figure 14.1. Since this diagram 
commutes, we have 

T o j — k 

The second diagram reflects the fact that we may set V = ^x 
Figure 14.2. Since this diagram commutes, we have 

a ok ~ j 

Making the appropriate substitutions gives 

T o a o k = k and a o r o j j 

But, the third commutative diagram in Figure 14.3 indicates that the 
identity is the unique linear transformation for which i o fc = A:, and so 
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T o(T = t. Similarly, by drawing the appropriate commutative diagram, 
we deduce that cr or = l. Thus, r is an isomorphism from to 







X ^ 








Another Look at the Direct Sum 

By way of motivation for defining tensor products, let us take 
another look at the external direct sum construction. Our plan is to 
characterize this sum in three difference ways. 

First, we have the definition. Suppose that U and V are vector 
spaces over the same field F. The external direct sum U BB V is the 
vector space of all ordered pairs 

UfflV = {(u,v)|uGU, v€ V} 

with coordinatewise operations 

(u,v) + (u',V) = (u + u',v + y') 

and 

r(u,v) = (ru,rv) 

For the second characterization, we begin by considering the 
Cartesian product U x V, which is simply the set of all ordered pairs 

UxV = {(u,v)luGU, v€ V} 

with no algebraic structure. Let be the free vector space on 

UxV. Thus, 



(14.1) '5uxv = { 5I''i("i>Vi)|(ui,Vi)eUxv, rjer} 



finite 



It is important to keep in mind that we allow no manipulations of the 
coordinates of the ordered pairs in "^uxV* instance, we cannot 

replace r(u,v) by (ru,rv) nor (u,v) + (u',v') by (u -h u',v + v'). In a 
sense, the ordered pairs in (14.1) act simply as ‘‘placekeepers” to 
separate the coefficients q. 
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In fact, the difference between UfflV is that, in 

U ffl V, we do have 

r(u,v) ~ (ru,rv) = 0 

and 

(u,v) + (u',v') - (u -h u',v + v') = 0 
for all r G F, u E U and v E V. 

Let us define S to be the subspace of ^ Y generated by all 
vectors of the form 

r(u,v) - (ru,rv) 

and 

(u,v) + (u',V) - (u + u> + v') 

for all r E F, u E U and v E V. It seems reasonable that the quotient 
space should be isomorphic to the direct sum UfflV. 

To prove this, consider the map ^ ffl V defined by 

I] ri(«i>^i)+s) = XI ri(«i.Vi) 

This map is well-defined, since if 

Xri(upVi)+S = Xsi(xi>yj)+S 

then 

But any element of S is equal to the zero vector in UfflV, and so the 
vectors Er|(u-,V|) and Esj(x^,yj) are equal in UfflV. Hence, 

(14.2) 

Furthermore, r is linear, and surjective. To see that r is injective, 
we must show that if 

(14.3) ® ill UfflV 
then 

To this end, observe that, as formal sums, Er|(uj,v-) E S if and only if 
the sum that results by replacing any terms, using the rules 

r(^iv)-4(ru,rv), (ru,rv)~^r(u,v) 
or 

(u,v) -f (u',V)-^(u 4- u',v + v'), (u -f u',v + v')-^(u,v) + (u',v') 
is also in S. Hence, since (14.3) simply says that, by performing such 
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replacements, we may reduce Erj(u-,V|) to 0, which is in S, the sum 
Sr|(uj,V|) must be in S. Thus, r is an isomorphism from ^uxv/^ 
to UfflV. 

As can be seen from the previous paragraph, it can be a bit 
awkward to describe UfflV as a quotient space. However, we do have 
another characterization, in terms of commutative diagrams. 
Associated with the direct sum UfflV are the two projections 
p^rUfflV— >U and /> 2 *UfflV^V defined by 

Pi((«>v)) = u and /> 2 (("-'^)) = ^ 

Let us consider the triple (UfflV,P 2 ,P 2 )* Referring to Figure 14.4, 
if W is any vector space over F, with linear maps f^iW-^U and 
f 2 :W— >V, then there exists a unique linear transformation r:W-^UfflV 
for which the diagram commutes, that is, for which 

and P 2 T = f 2 




Figure 14.4 



To see this, observe that, if such a r were to exist, then we would 
have 

Pli-rM) = fi(w) and p 2 {r(vf)) = f 2 (w) 
and so we must have 

(14.4) r(w) = (fj(w),f 2 (w)) 

We leave it to the reader to show that this actually defines a unique 
linear transformation r from W to UfflV. The following theorem 
shows that this property characterizes the direct sum. The proof is very 
similar to that of Theorem 14.1. 

Theorem 14.2 (The universal property of external direct sums) Let 

U and V be vector spaces over F. Let D be a vector space over F, 
and let (Tj:D— ^U and (T 2 :D--^W be linear transformations, as in 
Figure 14.5. Suppose that the triple (D,<Tj,(T 2 ) has the following 

property. If W is any vector space over F, and if f^iW—^U and 
f 2 :W-^V are linear transformations, then there exists a unique linear 
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transformation r:W—»D that makes the diagram commute, that is, or 
which 

cr^r =: and (T 2 T = f 2 

Then D is isomorphic to the external direct sum U ffl V. I 




In summary, we have three equivalent characterizations of the 
external direct sum U ffl V 

1) The definition: U ffl V = {(u^v) | u E U, v E V} 

2) The quotient space ct 

-^UxV 

S 

where ^ v vector space on U x V and 

S m span{r(u,v) - (ru,rv), (u,v) -f (u',v') - (u + u',v -f v')} 

3) By the universal property of external direct sums given in 
Theorem 14.2. 



Bilinear Maps and Tensor Products 

Before defining tensor products, we need a preliminary definition. 

Definition Let U, V and W be vector spaces over F. A function 
fill X V-^W is bilinear, if it is linear in both variables separately, that 
is, 

f(ru -f su',v) = rf(u,v) + sf(u',v) 

and 

f(u,rv 4- sv') = rf(u,v) + sf(u,v') 

The set of all bilinear functions from U x V to W is denoted by 
^B(U,V;W). A bilinear function fiUxV— >F, with values in the base 
field F, is called a bilinear form on U x V. D 
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Example 14.1 

1) A real inner product (,):V x V-^R is a bilinear form on V x V. 

2) If A is an algebra, the product map //:AxA-^A defined by 
/i(a,b) = ab is bilinear. In short, multiplication is linear in each 
variable. D 

If V is a vector space, we have two classes of functions from 
V X V to W, the linear maps £(V x V,W) and the bilinear maps 
?B(V,V;W). We leave it as an exercise to show that these two classes of 
maps have only the zero map in common. In other words, the only 
map that is both linear and bilinear is the zero map. 

Now we can define the tensor product of two vector spaces. 

Definition Let U and V be vector spaces over F, and let T be the 
subspace of the free vector space ^ u x V generated by all vectors of the 



form 




(14.5) 


r(u,v) -f s(u',v) — (ru + su',v) 


and 




(14.6) 


r(u,v) + s(u,v') — (u,rv + sv') 



for all r,s G F, u,u' G U and v,v' G V. The quotient space ^ux 
is called the tensor product of U and V and is denoted by U (8) V. D 

Note that in the case of the tensor product, we divide by the space 
spanned by all vectors in U x V that would be zero if the vector space 
operations were linear in each coordinate separately. According to this 
definition, an element of U V has the form 

J^ri(u;,Vi)+T 

It is customary to denote the coset (u,v)+T by u0v, and 
therefore any element of U 0 V has the form 






where 




(14.7) 


r(u 0 v) -h s(u' 0 v) = (ru 4- su') 0 v 


and 




(14.8) 


r(u 0 v) + s(u 0 v') = u 0 (rv 4- sv') 


Thus, 


^Uj®Vi= 



if and only if we can obtain one expression from the other by a finite 
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number of replacements using (14.7) and (14.8). 

As with the external direct sum, this definition, while intuitively 
pleasing, can be a bit difficult to work with, so we turn to a 
characterization via a universal property. 

Theorem 14.3 (The universal property of tensor products) Let U 

and V be vector spaces over the same field F. The pair (U0V,/), 
where /:IJ x V— 0 V is the bilinear map defined by 

<(u,v) = U 0 V 

has the following property. Referring to Figure 14.6, if fill x V— >W is 
any bilinear function from U x V to a vector space W over F, then 
there is a unique linear transformation r:U0V— that makes the 
diagram in Figure 14.6 commute, that is, for which 

T oi = f 

Moreover, U 0 V is unique, in the sense that if a pair (X,s) also has 
this property, then X is isomorphic to U 0 V. 

t bilinear ^ , 

uxv > u®v 




Figure 14.6 

Proof. To prove that (U 0 V,^) has the desired property, consider the 
diagram in Figure 14.7. 



UXV 






7C 



UXV 



^U0V 



^ V ' 

w 



Figure 14.7 

Since /(u,v) 1 = u0 V = (u,v)+T, the map /:UxV— »U0V is just the 
composition of the canonical injection j:\J xV— followed by the 
canonical projection ^ y^U 0 V = ^ That is, 

t — TToj 

Now, the universal property of free vector spaces implies that there is a 
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unique linear transformation which 

(TO j = { 

Note that, since f is bilinear, it sends any of the vectors (14.5) and 
(14.6) that generate T to the zero vector, so T C ker{a). Hence, we 
may apply Theorem 3.3, to deduce the existence of a unique linear 
transformation r:U 0 V-^W for which 

r o 7T = tr 

Hence, 

ro/ = ro7roj = croj=:f 

Moreover, if r' o / = f, then o-' = r' o ^ y-^W is a linear 
transformation for which 

o j(u,v) = r' o 7T o;(u,v) = r' o /(u,v) = f(u,v) = (T o;(u,v) 

and so a' oj = ao ;, implying that a' = cr, which in turn implies that 
t' = r. Hence, r is unique. We leave proof of the uniqueness of 
U 0 V as an exercise. I 



Theorem 14.3 says that to each fti/mear function f:UxV--»W, 
there corresponds a unique hnear function r:U 0 V—^W, through which 
f can be factored (that is, f=ro^). This establishes a map 
<^:?B(U,V;W)— ^£(U 0 V,W) given by <^(f) = r. In other words, 
is the unique linear map for which 

^(f):U o V-^W ^(f)(u 0 v) = f(u,v) 

Observe that (j) is linear, since if f,g G ^B(U,V;W), then 

[r«5(f) + sfli(g)](u ® v) = rf(u,v) + sg(u,v) = (rf + sg)(u,v) 
and so the uniqueness part of the universal property implies that 
r<^(f) + s<^(g) = <f>{Ti + sg) 

Also, (!) is surjective, since if r:U(8)V-»W is any linear map, then 
f = r o <:U X V— +W is bilinear, and by the uniqueness part of the 
universal property, we have <^(f) = r. Finally, <j> is injective, for if 
<^(f) = 0, then f=(^(f)o< = 0. We have established the following 
result. 



Theorem 14.4 Let U, V and W be vector spaces over F. Then the 
map <^:?B(U,V;W)— »i,(U ® V,W) defined by the fact that <^(f) is the 
unique linear map for which f = <^(f) o <, is an isomorphism. Thus, 

g&(U,V;W)«i.(U®V,W) 



I 
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Properties of the Tensor Product 

Armed with the definition and the universal property, we can now 
discuss some of the basic properties of tensor products. 

Theorem 14.5 If are linearly independent vectors in U, 

and are arbitrary vectors in V, then 

U| (g) V| = 0 => V| = 0 for all i 

Proof. Let us consider the dual vectors G U’*' to the vectors Uj. 

Thus, <5j(uj) = 6-j. For any linear functionals e-:V-^F, we define a 
bilinear form fill x V— >F by 

j=l 

Then, by the universal property of tensor products, there exists a 
unique linear functional r:U (g) V^F for which r o t = i. Hence, 

0 = r( ^ Uj ® V;) = ^ r o <(uj, Vj) 

i i 

i i j i 

Since the 6|’s are arbitrary, we deduce that v- = 0 for all i. I 

Corollary 14.6 If u 0 and v 0, then u 0 v ^ 0. I 

Theorem 14.7 Let ‘35=:{eJiGl} be a basis for U and C = 

I j ^ J} be a basis for V. Then the set ^ = {ej (g) fj | i E I, j € J} is 

a basis for U (g) V. 

Proof. To see that the ^ is linearly independent, suppose that 
This can be written 

' j 

and so, by Theorem 14.5, we must have 

j 

for all i, and hence f] j = 0 for all i and j. To see that ^ spans 
U (g) V, let u (g) V G U (g) V. Since u = r^Cj, and v = ^ Sjfj, we have 

i j 

u®v= X]''i®i®E¥j = 

i j j i 
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j i id 

Since any vector in U (8) V is a finite sum of vectors u <8) v, we deduce 
that spans U 0 V. I 



Corollary 14.8 For finite dimensional vector spaces, 

dim{\] 0 V) = rfim(U) • dim(V) I 



Theorem 14.9 Let U and V be finite dimensional vector spaces. 
Then 



U*0V*«(U0V)* 

via the isomorphism r:U* 0 V*-^(U 0 V)* defined by 
r(a 0 /?)(u 0 v) = a(u)/?(v) 



Proof. We must show that r is an isomorphism. Let us first fix 
a E U* and G V*, and consider the map ^:U x V-^F defined by 

= a(u)^(v) 

This map is bilinear, and so the universal property of tensor products 
implies that there exists a unique linear map i^:\] 0 V~^F for which 

Thus, /3 ^ ^ Now we define a map cr:U’^ x V*^(U 0 V)* 

by 

(T(a,/?) = 

This map is also bilinear. For instance. 



cr(rQ; + s/?, 7 )(u 0 v) = (ro + sp){u)j{w) 

= ra(u) 7 (v) + S/ 0 (u) 7 (v) 

= r<^(a, 7 )(u,v) + s<x{0,j){u,v) 
= [r<7-(a,7) +scr(/?, 7 )](u,v) 

and so 



<r{xa + s/?, 7 ) = r<7(a,7) + scr{l3,y) 



which shows that cr is linear in its first coordinate. Hence, the 

universal property implies that there exists a unique linear map 
r:U* ® V*^(U ® V)* for which 



that is, 



r(a(g)/?) = cr(a,p) 



r(a (8> ^)(u (g) v) = <r(a,^)(u ® v) = ® v) = a(u)/?(v) 
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To show that r is an isomorphism, let = {bj be a basis for 
U, with dual basis ‘iB' = a,nd let C = {cj be a basis for V, with 
dual basis C = {tJ- Then 

r(/?i ® 7j)(b^, 0 cj = ^i(bj7j(cj = j),(„,v) 

and so r(/?* 0 7 j) G (U <8) V)* is a dual basis vector to the basis 

{bu^c for U0V. Thus, r takes the basis {/?]0>7j} for U*0V* 

to the basis {'r(/?i0 7 j)} Hence, r is an isomorphism. I 

Combining the isomorphisms of Theorem 14.4 and Theorem 14.9, 
we have, for finite dimensional vector spaces U and V, 

U* 0 V* « (U 0 V)* « ^B(U,V;F) 

The Tensor Product of Linear Transformations 

Let r:V-^V' and be linear transformations. Then 

there is a unique linear transformation (r O (t):V 0 0 W' 

satisfying 

(14.9) (r 0 a){y 0 w) = r(v) 0 a{w) 

To see this, observe that the function f:VxW-^V'0W' defined by 
f(v,w) = r(v) 0 <t(w) is bilinear, and so by the universal property of 
tensor products, there exists a unique linear transformation r 0 <7 for 
which (14.9) holds. The map r 0 cr is called the tensor product of r 
and (7. 

Thus, we have a map 0:L(V,W) x £(V',W')-^Jt(V 0 W,V' 0 W') 
defined by 

(14.10) </>('^?cr) = r 0 <7 

This map is bilinear and so there is a unique linear transformation 
^:£(V,W) 0 L(V', WV£(V 0 W,V' 0 W') 
satisfying 9{r 0 cr) = r 0 cr. 

We propose to show that 6 is injective. Observe that any 
nonzero vector ^ G £(V,W) 0 £(V',W') has the form 

n 

i=l 

where the r^’s are linearly independent, and the (7*’s are linearly 
independent. To show that ker{6) = { 0 }, suppose that 
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Then 

n 

(14.11) ^ T|(v) (g) (T|(w) = 0 

i=l 

for all V G V and w G W. Let us choose v G V so that Tj(v) ^ 0, 
and suppose (by renumbering if necessary) that i'i(v),...,Tjj^(v) is a 
maximal linearly independent set among Tj(v), . . . , Thus, 

^u(v) = 

j=i 

for u = k+l,...,n. Hence, (14.11) gives 

i=l u=k+l j=l 

k k . n V 

i=l j=l u=k+l 

k , n , 

i=l u=k+l 

and since T 2 (v),. . .,Tj^(v) are linearly independent, we must have 

n 

ruj%(w) = 0 

u=k+l 

for all i = 1, . . . , k, and all w G W. Hence, 

n 

u=k+l 

which is in contradiction to the fact that the <T|’s are linearly 
independent. Hence, 6(^) ^ 0 and so 9 is injective. 

Note that if all vector spaces are finite dimensional, then 6 is 
also surjective, and hence is an isomorphism. In any case, the fact that 
9 :t (g) ah^T 0 cr is injective motivates the commonly used notation 
r 0 (7 for the tensor product r 0 cr. Let us summarize. 

Theorem 14.10 Let r G L(V,V') and (t G L(W,W'). There is a 
unique linear transformation r 0 (t G L(V 0 W,V' 0 W'), called the 
tensor product of r and <r, satisfying 

(r 0 cr)(v 0 w) = r(v) 0 cr{w) 

Moreover, there is a (unique) injective linear transformation 
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(9:£(V,W) 0 0 W,V' 0 W') 

satisfying 0{r 0 (t) r O cr. In case all vector spaces are finite 
dimensional, 9 is an isomorphism. I 



Change of Base Field 

We have seen in earlier chapters that a linear operator r, defined 
on a real n-dimensional vector space V, may not have n eigenvalues 
(counting multiplicity), since its characteristic polynomial may not split 
over R. On the other hand, a linear operator over the complex 
n-dimensional inner product space does have n eigenvalues. This leads 
us to wonder whether we can extend a real vector space to a complex 
vector space, and correspondingly extend a real operator to a complex 
operator. 

Let us approach this question in more generality. For 
convenience, we refer to a vector space over a field F as an F-space. 
There are several approaches to “upgrading” the base field of a vector 
space. For instance, suppose that V is an F-space, and that F' is an 
extension field of F, that is, F' D F. If {bj is a basis for V, then 
every element x of V has the form 

where r^ G F. We can define an F'-space V' simply by taking all 
formal linear combinations of the form 

where r- G F'. In other words, V' is the free F'-space on the set {b-}. 
Note that the dimension of V' as an F'-space is the same as the 
dimension of V as an F-space. Also, V' is an F-space (just restrict 
the scalars to F), and as such, the inclusion map j:V— sending 
X G V to j{x) = X G V', is an F-monomorphism. 

The approach described in the previous paragraph uses an 
arbitrarily chosen basis for V, and is therefore not coordinate free. 
However, we can give a coordinate-free approach using tensor products 
as follows. If V is an F-space, let 

V'=:F'0pV 

It is customary to include the subscript F on 0 p to denote the fact 
that the tensor product is taken with respect to the base field F. (All 
relevant maps are F-bilinear and F-linear.) However, since we will not 
take tensor products with respect to any other field, we will not always 
use this notation. 

The vector space V' is an F-space by definition of tensor product. 
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but we may make it into an F'-space as follows. Fix an s' G F', and 
consider the map fg/:(F' x V)-~^(F' (8) pV) by 

f^,(r',v) = s'r' 0 V 

Since f / is bilinear, the universality property of tensor products 
implies that there is a unique F-linear map r^/:(F' 0 pV)--»(F' 0 pV) 
for which 

r /(r' 0 v) = s'r' 0 v 

This map is intended to be multiplication by the scalar s' G F'. Note 
that, since is F-linear, it is additive, and so 

r^/(r' 0 V -b u' 0 w) = r^/(r' 0 v) -h 0 w) 

that is, 

s'(r' 0 V -(- u' 0 w) = s'(r' 0 v) + s'(u' 0 w) 

Since all of the defining properties of scalar multiplication are satisfied, 
V' is indeed an F'-space. 

It is not hard to see that if {bj is a basis for the F-space V, 
then {1 0bj is a bctsis for the F'-space V', and so the dimension of 
the F'-space V' is equal to the dimension of the F-space V. 

The map i;:V— ^V' defined by i;(v) = 1 0 v is easily seen to be 
an F-monomorphism, and so the F-space V' contains an isomorphic 
copy of V. The F-linear monomorphism v is sometimes called the F - 
extension map of V. This map has a universal property of its own, as 
described in the next theorem. 

Theorem 14.11 Let i;:V-^V' = F'0pV be the F'-extension map of an 
F-space V. Then v has the following universal property. For any F- 
linear map f:V-^W', where W' is any F'-space, there exists a unique 
F'-linear map r:V'— ^W' for which the diagram in Figure 14.8 is 
commutative, that is, 

T ov = i 

Proof. If such a map r:F' 0 pV—>W' is to exist, then it must satisfy 

(14.12) r(r' 0 v) = r'r(l 0 v) = r'f(v) 

This shows that, if r exists, it is uniquely determined by f. To see 
that r exists, consider the map g:(F'xV)— >W' defined by 

g(r',v) = r'f(v) 

Since this is bilinear, there exists a unique F-linear map r for which 
(14.12) holds. It is easy to see that r is also F'-linear, since 

r[s'(r' 0 v)] = r(s'r' 0 v) == s'r'f(v) = s'r(r' 0 v) I 
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V > v'= P'®pV 




Figure 14.8 



Theorem 14.11 is the key to describing how to extend an F-linear 
map to an F'-linear map. 

Theorem 14.12 Let V and W be F-spaces, with F'-extension maps 
u and respectively. (See Figure 14.9.) Then for any F-linear map 
r:V-^W, the map r' = (8) r:V'— »W' is the unique F'-linear map that 

makes the diagram in Figure 14.9 commutative, that is, for which 

flOT = t' ov 

Proof. The map //or is an F-linear map from the F-space V to the 
F'-space W'. Hence, Theorem 14.11 shows that there is a unique F'- 
linear map r':V'-^W' such that 

fiOT = r' ov 

To see that r' = 0 r, observe that 

r'(r' 0 v) = rV'(l 0 v) = r'(r' o i/)(v) = r'(/i o r)(v) 

= r'(l 0 r(v)) = 0 r(v) = (/p, 0 r)(r' 0 v) I 



T F-linear^ 

TO^>W = f'(8^W 

Figure 14.9 



Multilinear Maps and Iterated Tensor Products 

The tensor product operation can easily be extended to more than 
two vector spaces. We begin with the extension of the concept of 
bilinearity. 
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Definition If V 2 ,...,Vj^ and W are vector spaces over F, a function 
f: Vj X • • • X is said to be multilinear if it is linear in each variable 

separately, that is, if 

f(up . . . , Uj,_i,rv + sv',Uk+i, . = 

rf(up . . . , u^_i,v,Uk+i, . . . , + sf(uj, . . . , , u„) 

for all k = l,...,n. A multilinear function of n variables is also 
referred to as an n-linear function. The set of all multilinear functions 
will be denoted by Mul(V^, . . ., Vj^;W). A multilinear function from 
X • • • X to the base field F is called a multilinear form (or n- 
form). D 

Example 14.2 

1) If A is an algebra then the product map //:A x • • • x A-^A 
defined by ^(a^, . . . , a^J = a^« • -a^^ is n-linear. 

2) The determinant function is an n-linear form on the 

columns of the matrices in D 

Definition Let V^,...,Vj^ be vector spaces over F, and let T be the 
subspace of the free vector space on x • • • x generated by all 
vectors of the form 

r(vp . . . , v^_i,u,V]^^p . • • , vj + s(vj, . . . , . . . , vj 

- (vj, . . . , Vk_i,ru + su',Vk+i, . . . , v„) 

for all r,s G F, u,u' G U and v^, . . . , G V. The quotient space 7/T 
is called the tensor product of V|,...,Vj^, and denoted by 

Vi(8)--*(8)Vn. D 

As before, we denote the coset (vj,..., v^J+T by v^0***0Vj^, 
and so any element of 0 • • • 0 has the form 

E V; 0 • • • 0 V; 

T hi 

where the vector space operations are linear in each variable. 

The tensor product can also be characterized by a universal 
property. 

Theorem 14.13 (The universal property of tensor products) Let 

be vector spaces over the field F. The pair 
(Vj 0 • • • 0 Vj^,t), where x • • • x 0 • • • 0 is the 

multilinear map defined by 
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has the following property. Referring to Figure 14.10, if 
f:V| X • • • X is any multilinear function from Vj x • • • x to a 

vector space W over F, then there is a unique linear transformation 
r:V| (g) • • • 0 that makes the diagram in Figure 14.10 commute, 

that is, for which 

T oi = { 

Moreover, 0 • • » 0 is unique in the sense that if a pair (X,^) also 
has this property, then X is isomorphic to V| 0 • • • 0 Vj^. I 




Figure 14.10 

Here are some of the basic properties of multiple tensor products. 

Theorem 14.14 The tensor product has the following properties. Note 
that all vector spaces are over the same field F. 

1) (Associativity) There exists an isomorphism 

r:(Vj 0 • • • 0 Vj^) 0 (Wj 0 • • • 0 Wj^) 

-^Vi0*--0V^^0Wi0*--0Wj,, 

for which 

r[(vi 0 • • • 0 0 (w^ 0 • • * 0 = V| 0 • • • 0 0 0 • • • 0 

In particular, 

(U0V)0W«U0(V0W)«U0V0W 

2) (Commutativity) Let w be any permutation of the indices 
{1, . . . , n}. Then there is an isomorphism 

for which 

a(vi ® • • • (8) vj = 8 • • • 0 

3) There is an isomorphism p^:F for which 

Pl(r 0 v) = rv 

and similarly, there is an isomorphism p 2 -Y 0 F-^V for which 

/> 2 (v 0 r) = rv 
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Hence, F(8)V«V«V(8)F. I 

The analog of Theorem 14.4 is the following. 

Theorem 14.15 Let Vj,...,Vj^ and W be vector spaces over F. 
Then the map <^:Mul(Vj,. . ., Vj^;W)“^£(Vj 0 • • • 0 Vj^,W), defined by 
the fact that <^(f) is the unique linear map for which f = <^(f) o is an 
isomorphism. Thus, 

Mul(Vi, . . . , V^;W) « £(Vi 0 • • • 0 V^, W) 

Moreover, if all vector spaces are finite dimensional, then 

rfim[Mul(Vi, . . . , V^;W)] = dim{Y^y • -diiniVj • dirn{W) I 

Alternating Maps and Exterior Products 

We will use the notation to denote the Cartesian product of 

V with itself n times, and 0 to denote the n-fold tensor product. 

The following definitions describe some special types of multilinear 
maps. 

Definition 

1) A multilinear map f:V”-^W is symmetric if 
for any i 91^ j. 

2) A multilinear map f:V"— >W is skew-symmetric if 

f(Vi,...,Vi,...,Vj,...,vJ = -f(Vj,...,Vj,...,Vi,...,vJ 

for iy^j. 

3) A multilinear map f:V“— >W is alternating if 

f(vi,...,vj = 0 

whenever any two of the vectors v- are equal. D 

A few remarks about permutations, with which the reader may 
very well be familiar, are in order. A permutation of the set N = 
{!,..., n} is a bijective function 7 t:N— »N. We denote the set of all 
such permutations by Sj^. This is the symmetric group on n symbols. 
A cycle of length k is a permutation of the form (ij,i2,...,ij^^), that 
sends i^ to i2, i2 to ,..., ij^__j to ij^ and ij^ to i^. (We assume that 
i^ 7^ i^ for u ^ V.) All other elements of N are left fixed. Every 
permutation is the product (composition) of disjoint cycles. 

A transposition is a cycle (i,j) of length 2. Every cycle (and 
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therefore every permutation) is the product of transpositions. In 
general, a permutation can be expressed as a product of transpositions 
in many ways. However, no matter how one represents a given 
permutation as such a product, the number of transpositions is either 
always even or always odd. Therefore, we can define the parity of a 
permutation ^ E to be the parity of the number of transpositions in 
any decomposition of tt as a product of transpositions. The sign of a 
permutation is defined by 

sg(;r) = (_i)P"i‘y(^) 

Thus, sg(7r) = 1 if tt is an even permutation, and -1 is tt is an odd 
permutation. The sign of tt is often written (-1)^. 

With these facts in mind, it is apparent that f is symmetric if 
and only if 

for all permutations tt E and that f is alternating if and only if 

for all permutations tt E S^. 

If f is a multilinear function, then 

Hence, if f is alternating, then it is also skew-symmetric. On the other 
hand, if f is skew-symmetric, we have 

and so, provided that char(F) ^ 2, this gives 

f(Vi,...,Vi,...,V;,...,vJ = 0 
and so f is alternating. 

We have discussed symmetric and alternating bilinear functions in 
Chapter 11. Our intention here is to briefiy discuss alternating 
multilinear functions, which play an especially important role in 
differential geometry and its applications. 

Definition Let V be a vector space over a field F with char(F) ^ 2, 
and let (8)^V be the n-fold tensor product of V with itself. Let U 
be the subspace of 0 generated by all elements of the form 

(Vj 0 • • • 0 V| 0 • • • 0 Vj 0 • • • 0 VjJ -f (Vj 0 • • • 0 Vj 0 • • • 0 V| 0 • • • 0 VjJ 
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for all i < j. The quotient space ( 0^V)/U is called the nth exterior 
product space of V and is denoted by 

A^V or VA---AV D 

n factors 

It is customary to denote the coset (vj (8) • • • (8> Vj^)+U by 
Vj A • • • A and refer to A as the wedge product. Thus, any element 
of Vj A • • • A has the form 

E V; A • • • A V; 

where the vector space operations are linear in each variable, and where 
the interchange of any two variables introduces a minus sign. 

The exterior product can also be characterized by a universal 
property. 

Theorem 14.16 (The universal property of exterior products) Let 

Vi,...,Vn be vector spaces over a field F with char(F) ^ 2. The pair 
(Vj A--- A Vj^,a), where a:Vj x • • • x A • • • A is the 

alternating multilinear map defined by 

has the following property. Referring to Figure 14.11, if 
f:Vj X • • • X Vjj— >W is any alternating multilinear function from 

V 2 X*»*xVj^ to a vector space W over F, then there is a unique 
linear transformation r:Vj A • • • A that makes the diagram in 

Figure 14.11 commute, that is, for which 

r o a = f 

Moreover, A • • • A is unique in the sense that if a pair (X,cr) also 
has this property, then X is isomorphic to A • • • A I 
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EXERCISES 

1. Verify that the set x V ^ vector space. 

2. Show that if r:W-^X is a linear map, and 6:UxV-^W is 
bilinear, then roftiUxV^X is bilinear. 

3. Show that the only map that is both linear and n-linear (for 
n > 2) is the zero map. 

4. Find an example of a bilinear map r:VxV-^W whose image 
2 m(r) = {r(u,v) | u,v G V} is not a subspace of W. 

5. Prove that U 0 V « V 0 U. 

6. Let X and Y be nonempty sets. Use the universal property of 

tensor products to prove that x Y ~ ^ ^Y* 

7. Let u,u' G U and v,v' G V. Assuming that u 0 v / 0, show that 
u 0 V = u' 0 v' if and only if u' = ru and v' = r“^v, for r ^ 0. 

8. Let = {b-} be a basis for U and C = {c-} be a basis for V. 
Show that any function f:U x V— >W can be extended to a linear 
function f:U0V— ^W. Deduce that the function f can be 
extended in a unique way to a bilinear map f:U x V— ^W. Show 
that all bilinear maps are obtained in this way. 

9. Let S^,S 2 be subspaces of U. Show that 

(Si 0 V) n (S 2 0 V) « (Si n S 2 ) 0 V 

10 . Let S C U and T C V be subspaces of vector spaces U and V, 
respectively. Show that 

(S 0 V)n(U 0 T)«S 0 T 

11. Let Si,S 2 C U and Ti,T 2 CV be subspaces of U and V, 
respectively. Show that 

(Si 0 Ti) n (S 2 0 T 2 ) ^ (Si n S 2 ) 0 (Ti 0 T 2 ) 

12 . Find an example of two vector spaces U and V and a nonzero 
vector X G U 0 V that has at least two distinct (not including 
order of the terms) representations of the form 

n 

X = Uj 0 Vj 
i=l 

where the Uj’s are linearly independent, and so are the v-’s. 
However, prove that the number n of terms is the same for all 
such representations. 

13. What is the dimension of the space ^B(U,V;F) of all bilinear 
forms on U x V? (Assume U and V are finite dimensional.) 

14. Let denote the identity operator on a vector space X. Prove 

that iy 0 = iy yj, 

15. Suppose that TiiU— »V, T 2 :V^W, and <Ti:U'^V', ct 2 :V'— >W'. 
Prove that 
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(Tj O Tj) © ((Tj o (Tj) = (Tj 0 (Tj) o (Tj 0 <Tj) 

16. Let V be an F-space, and F' D F. Prove that if {bj is a basis 
for the F-space V, then {1 (8)b|} is a basis for the F'-space V'. 

17. Connect the two approaches to extending the base field of an F- 
space V to F' (at least in the finite dimensional case) by 
showing that F^ <S> pF' « (F')^. 

18. Prove that any permutation G is the product of disjoint 
cycles. Then prove that any cycle is the product of transpositions. 

19. Prove that if tt G then any decomposition of tt as a product 

of transpositions has the same parity. Hint: Consider the 

polynomial 

p(xi,...,xjj)= 

i <j 

and let ’r(p) = Show that 7r(p) = p if p is 

the product of an even number of transpositions, and 7r(p) = -p if 
IT is the product of an odd number of transpositions. 




CHAPTER 15 



Affine Geometry 



Contents: Affine Geometry, Affine Combinations, Affine Hulls, The 
Lattice of Flats, Affine Independence, Affine Transformations, 
Projective Geometry, Exercises, 

In this chapter, we will study the geometry of a finite dimensional 
vector space V, along with its structure preserving maps. Throughout 
this chapter, all vector spaces are assumed to be finite dimensional. 



Affine Geometry 

Definition Let V be a vector space. If v E V and S is a subspace of 
V, then the set 

v-fS=={v4-s|sES} 

is called a flat, or coset in V. The set A(y) of all flats in V is 
called the affine geometry of V. The dimension dim{A{\)) of *A(V) 
is defined to be dim{Y), D 

It is clear that a flat in V is nothing more than a translated 
subspace of V. We will denote subspaces of V by the letters S,T,. . . 
and flats in V by X,Y,.... Here are some of the basic intersection 
properties of flats. 

Theorem 15.1 

1) The following are equivalent: 

a) x + S = y + S b) xEy + S c)x-yES 
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Let X = X + S and Y = y + T be flats in V. Then 

2) S C T O V + X C Y for some v G V 

3) S = T O' V + X = Y for some v G V 

4) XnY7^0,ScT O XCY 

5) XnY^0,S = T o X = Y 

Proof. We leave proof of part (1) as an exercise. To prove (2), observe 
that S = -X + X and T = ~y + T, and so 

S CT o -x + X C -y + Y o (y-x) + X C Y 

As for (3), we have 

(y-x)+XcY and (x-y)+YcX 

and so 

(y-x) + XcYc(y-x) + X 

which implies that (y — x) + X = Y. 

To prove (4), let zGXflY. Then part (2) tells us that 
V + X C Y, and so v + z = y G Y, which implies that v = y — z G T. 
Hence, X C -v + Y C Y. Part (5) follows from (4). I 

Part (1) of the previous theorem says that a flat can be 
represented in many ways, in the form x -h S. When a flat is written 
X + S, we refer to x as the flat representative, or coset representative of 
the flat. Any element of a flat can be used as a flat representative. On 
the other hand, part (3) of Theorem 15.1, with v =: 0, implies that 
each flat x + S is associated with a unique subspace S. This allows us 
to make the following definition. 

Definition The dimension of a flat x + S is rfim(S). A flat of 
dimension k is called a k-flat. A 0-flat is a point, a 1-flat is a line and 
a 2-flat is a plane. A flat of dimension dim{A{y)) — 1 is called a 
hyperplane. D 

Definition Two flats X = x -f S and Y = y -|- T are said to be 
parallel if S C T or T C S. This is denoted by X || Y. D 

According to Theorem 15.1, if X || Y, then X C Y, Y C X or 
XflY = 0. Moreover, part (2) of Theorem 15.1 says that X and Y 
are parallel if and only if some translation of one of these flats is 
contained in the other. 

Affine Combinations 

If r| G F and r^ H ^^en the linear combination 
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riXi + *.. + r^x^ 

is referred to as an affine combination of the vectors Xj , . . . , x^^. 

Theorem 15.2 If char(F) / 2, then the following are equivalent for a 
subset X of V. 

1) X is closed under the taking of affine combinations of any two of 
its points, that is, 

x,yGX rx+(l-r)yGX 

2) X is closed under the taking of affine combinations, that is, 

ri + --- + r„= 1 => + --- + r„x„ G X 

Proof. It is clear that (2) implies (1). For the converse, we proceed by 
induction. According to (1), for x^^,X 2 G X, 

rj -h f2 = 1 => r^Xj -f r2X2 G X 

Assume for the purposes of induction that for Xj G X 

ri + --- + r„_l = l ^ riXi + --- + r„_iX„_lGX 

Let Xj,...,Xj^GX and r^ H hr^^^l, and consider the affine 

combination 

z = riXi+“- + r„x^, 

If one of r^ or r 2 is different from 1, say r^ ^ 1, then we may write 
z = r^x^ + (1 - ri)(y^X2 + • • • + J^Xn) 

and since the sum of the coefficients of the sum inside the large 
parentheses is 1, the induction hypothesis implies that this sum is in 
X. Then (1) shows that z G X. On the other hand, if r| = r 2 = 1, 
then since char(F) 2, we may write 

z = 2[ixi + ix2] + 13X3 + • • • + r^x„ 

and fiince (1) implies that |x^+|x 2 GX, we may again deduce from 
the induction hypothesis that z G X. In any case, z G X, and so (2) 
holds. I 

Note that the requirement char(F) ^ 2 is necessary, for the subset 
X = {(0,0), (1,0), (0,1)} of (F 2 )^ satisfies (1), but not (2), in Theorem 
15.2. We can now characterize flats. 

Theorem 15.3 

1) A subset X of V is a flat in V if and only if it is closed under 
the taking of affine combinations, that is. 
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Xp...,Xj^GX, ri + “* + rj^= 1 + • • • + G X 

2) If char(F) ^ 2, a subset X of V is a flat if and only if X 
contains the line through any two of its points, that is, if and only 
if 

x,y G X rx + (l~r)y G X 

Proof. Suppose that X = x + S is a flat, and Xj,...,Xj^ G X. Then 
x^ = x + Sj, for S| G S, and so if Er^ = 1, we have 

and so X is closed under affine combinations. Conversely, suppose 
that X is closed under the taking of affine combinations, and let 

S = {x-Xq |xG X} 

for some Xq G X. If Xj — Xq, ...,Xj^ — Xq are arbitrary vectors in S, 
and r^, . . . , r^^ G F, then 

=ri*l+-" + rn^ + (l-ri rn)^-Xo€S 

Thus, S is closed under the taking of linear combinations, and so is a 
subspace of V. This implies that X = Xq -f S is a flat. Part (2) 
follows from part (1) and Theorem 15.2. I 



Affine Hulls 

The following definition gives the analog of the subspace spanned 
by a collection of vectors. 



Definition Let C be a nonempty set of vectors in V. The affine hull 
huU(C) of C is the smallest flat containing C. We also refer to 
huU(C) as the flat generated by C. D 



Theorem 15.4 Let C be any nonempty subset of V. The affine hull 
huU(C) is the set of all affine combinations of vectors in C 




n> 1, Xp...,x^^GC 




Proof. According to Theorem 15.3, any flat containing C must 
contain all affine combinations of vectors in C. It remains only to 
show that the set X of all such affine combinations is a flat. To this 
end, let y G X, and consider the set 



s = {yj - y I yj € X} 

It suffices to show that S is a subspace of V, for then X = y -f S is 
indeed a flat. To this end, let 
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yi=Ylh,\^ and y2=£*'2,i*i 

i=l i=l i=l 

(By including additional zero coefficients if necessary, we may assume 
that the upper limits of summation are the same.) Hence, any linear 
combination of — y and y 2 ~'Y has the form 

z = s(yi-y) + t(y2-y) 

n n 

= ® '2,i*i - (® + t)y 

i=l i=l 

n 

= E + *r 2 ,i)*i - (s + 1 - l)y - y 
i=l 

= + *’■24 - (s + 1 - l)ro,i)*i - y 



i=l 



But, 



£(sri,i + tr2,i - (s + 1 - l)ro,i) = « £ ’’l.i + ^ S ^2,i " (s + 1 " 1) XI ^0,i 

i=l i=l i=l i=l 



— S-f"t — (s-f-t — 1) — 1 



which shows that z E S. Hence, S is a subspace of V. I 

The affine hull of a finite set of vectors is denoted by 
huU{iL ^, . . . We leave it as an exercise to show that 

(15.1) Aa//{xj, . . . ,x„} = Xj + (xj-Xi,. . . ,Xj_i-Xi,Xi^j-Xi, . . . ,x„-Xj) 

where (xj-x^,. ^he subspace spanned 
by the vectors within the angle brackets. This shows that 

rfi?n(/iM//{X|, . . . < n — 1 

The affine hull of a pair of distinct points is the line through those 
points, denoted by 

xy = {rx+(l-r)y|r EF} =y + (x-y) 



The Lattice of Flats 

Since flats are subsets of V, they are partially ordered by set 
inclusion. 
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Theorem 15.5 The intersection of a nonempty collection C = 
{xj + S| I i G K} of flats in V is either empty or is a flat. If the 
intersection is nonempty, then 

n (Xi + S;)=X+ f| Sj 

i 6 K i 6 K 

for any vector x in the intersection. 

Proof. If 

X € f) (Xj + Sj) 

i€K 

then Xj + S| = X + Sj for all i G K, and so 

n (xj+sj) = n (x+Si) =x+ n ■ 

i€K ieK ieK 

Definition The join of a nonempty collection C = {xj + Sj | i G K} of 
flats in V is the smallest flat containing all flats in C. We denote the 
join of the collection C of flats by V C, or by 

V {x^ + S:} 

The join of two flats is denoted by (x + S) V (y + T). D 

Theorem 15.6 Let C = {xj + Sj | i G K} be a nonempty collection of 
flats in V. 

1) V C is the intersection of all flats that contain all flats in C. 

2) V C is Aw//(C), where C is the union of all flats in C. I 

Theorem 15.7 For any two flats in V, 

(x + S)V(y + T) = x+[(x-y) + S + T] 

Proof. Since x,y G (x + S) V (y + T), we have 

(x + S)V(y + T)=x+U = y + U 

for some subspace U of V. Hence, x — yGU, andso {x — y)cU. 
Moreover, x + S C x + U implies that S C U, and similarly T C U. 
Hence, 

x+[{x-y} + S + T] Cx + U 

Since x + S and y + T are both contained in x + [{x - y) + S + T], we 
deduce that x + U Cx+{x — y) + S-fT. The result follows. I 

We can now describe the dimension of the join of two flats. 
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Theorem 15.8 Let X = x -h S and Y = y + T be flats in V. 

1) If XnY#0 then 

a) XVY = x + S+T 

b) dim{X V Y) = dim{S 4- T) = dim{X) + dim{Y) — dim(X OY) 

2) If XnY = 0 then 

dirn{X V Y) = dim(S + T) 4* 1 
Proof. Using Theorem 15.7, we have 
(x 4 -S)n(y + T) 7^0 <=> 3sGS, t€T s.t. x4-s = y4-t 

^x--y€S 4 -T ^ (x-y) + S + T = S 4 -T^XVY = x 4 -S + T 
This establishes (la) and (2). As to (lb), note that 

dim(S 4 - T) = dim{S) 4 - dim{T) — dim{S fl T) 
and that, if (x 4 - S) fl (y 4 - T) 7 ^ 0, then 
dim{S n T) = dim{x 4 - [S fl T]) = dim{[x 4 - S] n [y 4 - T]) = dim{X fl Y) I 

Affine Independence 

We now discuss the affine counterpart of linear independence. 

Theorem 15.9 Let x^,...,Xj^ be vectors in V. The following are 
equivalent. 

1) X = Au//{x^, . . . ,Xj^} has dimension n — 1. 

2) {xj-Xj,...,Xj_j-Xj,x^_j_|-x^,...,Xj^~Xj} is linearly independent for 
all i = l,...,n. 

3 ) xj ^ A«//{xp...,Xj_pXj^^,..., 3 qJ for all i = l,...,n. 

4) If SrjXj and SsjX^ are affine combinations, then 

j j 

Proof. The fact that ( 1 ) and (2) are equivalent follows directly from 
(15.1). If (3) does not hold, we have 

where by (15.1), the latter has dimension at most n — 2. Hence, (1) 
cannot hold, and so (1) implies (3). 

Next we show that (3) implies (4). Suppose that (3) holds, and 
that SrjXj = SsjXj. Setting tj=rj-Sj gives 

5 ]t.Xj = 0 and = 0 

j j 

But if any of the tj’s are nonzero, say t^ 7 ^: 0, then dividing by t^ 
gives 
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*1 + ® 

j>l 

j>l 

E-(‘A) = i 

j >1 

Hence, x^e huU{x 2 ^..-^x^}- This contradiction implies that t j = 0 
for all j, that is, T: = Sj for all j. Thus, (3) implies (4). 

Finally, we show that (4) implies (2). For concreteness, let us 
show that (4) implies that {x 2 -x^,,..,x^-x^} is linearly 

independent. Indeed, if a 2 ? . . . i ^ ^ ~ 

S “j(*i ~ *i) = ® + Z) = *1 

j>2 j>2 j>2 

But the latter is an equality between two affine combinations, and so 
corresponding coefficients must be equal, which implies that = 0 for 
all j = 2,...,n. This shows that (4) implies (2). I 

Deflnition The vectors Xj,...,Xj^ are affinely independent if they 
satisfy any (and hence all) of the conditions of Theorem 15.9. D 

Theorem 15.10 If X is a flat of dimension n, then there exist n+l 
vectors Xj,...,Xj^^^ for which every vector xGX has a unique 
expression as an affine combination 

x = riXi+.«. + r^^lX,,^l 

The coefficients r^ are called the barycentric coordinates of x with 
respect to the vectors x^, . . I 

Affine Transformations 

Now let us discuss some properties of maps that preserve affine 
structure. 

Deflnition A function f:V-^V that preserves affine combinations, that 
is, for which 

Sr. = 1 

i 1 i 

is called an affine transformation (or affine map, or affinity). D 



or 

where 




15 Affine Geometry 



323 



We should mention that some authors require that f be bijective 
in order to be an affine map. The following theorem is the analog of 
Theorem 15.2. 

Theorem 15.11 If char(F) ^ 2, then the following are equivalent for a 
function f:V~^V. 

1) f preserves affine combinations of any two of its points, that is, 

f(rx + (l-r)y) = rf(x) + (l-r)f(y) 

2) f preserves affine combinations, that is, 

1 i i 

Thus, if char(F) ^ 2, then a map f is an affine transformation if 
and only if it sends the line through x and y to the line through f(x) 
and f(y). It is clear that linear transformations are affine 
transformations. So are the following maps. 

Definition Let v G V. The affine map T^:V--^V defined by 

T^(x) = X -h V 

for all X G V, is called translation by v. D 

It is not hard to see that any map of the form T^ o r, where 
T G i(V), is affine. Conversely, any affine map must have this form. 

Theorem 15.12 A function f:V— >V is an affine transformation if and 
only if f = Ty o r, where v G V and r G L(V). 

Proof. We leave proof that T^or is an affine transformation to the 
reader. Conversely, suppose that f is an affine map. Then 

f(rx -h sy) = f(rx -f sy + (1 - r - s)0) = rf(x) + sf(y) + (1 - r - s)f(O) 

Rearranging gives 

f(rx + sy) - f(0) = r[f(x) - f(0)] + s[f(y) - f(0)] 
which is equivalent to 

('l’-f(O) ° + ®y) = KT_f(0) ° f)(x) + O f)(y) 

and so r = o f is linear. Thus, f = T^^q^ o r. I 

Corollary 15.13 

1) The composition of two affine transformations is an affine 
transformation . 
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2) An affine transformation f = o r is bijective if and only if r 
is bijective. 

3) The set AfJ{W) of all bijective affine transformations on V is a 
group under composition of maps, called the affine group of V. I 



Let us make a few remarks for those familiar with the basics of 
group theory. The set Trans{Y) of all translations of V is a 
subgroup of Afj{Y), We can define a function <j>:AfJ{\)-^L(V) by 



4>{Ty ot) = t 



It is not hard to see that is a well-defined group homomorphism 
from Aff(y) onto Jt(V), with kernel Trans(Y). Hence, Trans(V) is 
a normal subgroup of Afj{W) and 



AfAy) 

Trans(V) 



£(V) 



Projective Geometry 

If dim(Y) = 2, then the join of any two distinct points in V is a 
line. On the other hand, it is not the case that the intersection of any 
two lines is a point. Thus, we see a certain asymmetry between the 
concepts of points and lines in V. This asymmetry can be removed by 
constructing the so-called projective plane. Our plan here is to very 
briefly describe one possible construction of projective geometries of all 
dimensions. 

By way of motivation, let us consider Figure 15.1. 




Figure 15.1 
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Note that H is a hyperplane in a 3-dimensional vector space V and 
that 0 ^ H. Now, the set jI(H) of all flats of V that lie in H is an 
affine geometry of dimension 2. (According to our definition of affine 
geometry, H must be a vector space in order to define •A(H). 
However, we hereby extend the definition of affine geometry to include 
the collection of all flats contained in a flat of V.) 

To each flat X in H, we associate the subspace (X) of V 
generated by X. This defines a function 

PU(H)->f(V), P(X) = (X) 

where if(V) is the set of all subspaces of V. Note that P is not onto 
if(V), but only because im(P) does not contain any subspaces of the 
subspace K that contains the origin and is parallel to H. Figure 15.1 
shows a one-dimensional flat X, and its image P(X) = (X), as well as 
a zero-dimensional flat Y, and its image (Y). Note that, for any flat 
X in H, we have dim(P{X)) — dim{X) 1. 

Note also that if and L 2 are any two distinct lines in H, the 
corresponding planes P(L^) and P(L 2 ) have the property that their 
intersection is a line through the origin. We are now ready to define 
projective geometries. 

Definition Let V be a vector space. The set ^P(V) of all subspaces of 
V is called the projective geometry of V. If S is a subspace of V, its 
projective dimension, denoted by pdini{S) is equal to dim{S) — 1. The 
projective dimension of *3P(V) is defined to be pdimfV) = dim{V) — 1. 
A subspace of projective dimension 0, 1 or 2 is called a projective 
point, projective line, or projective plane, respectively. D 

Thus, referring to Figure 15.1, a projective point is a line through 
the origin and, provided that it is not contained in the plane K 
described earlier, it meets H in an (affine) point. Similarly, a 
projective line is a plane through the origin and, provided that it is not 
K, it will meet H in a line. (This holds in higher dimensions as well.) 
In short, 

P(point) = projective point, P(line) = projective line 
P(plane) = projective plane 

and so on. 

Given a vector space V of any dimension, and any hyperplane H 
in V not containing the origin, we can define the function 
P:jt(H)~^^(V), as shown in Figure 15.1 for dim{\) = 3. It is also 
clear from this figure that a projective geometry of projective dimension 
n is an “extension” of an affine geometry of (affine) dimension n, 
formed in such a way that all “objects” intersect. More specifically, the 
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map P:jI(H)— ^^(V) satisfies the properties described in the following 
theorem. 



Theorem 15.14 The map P:J.(H)-^^P(V) from the affine geometry 
J,(H) to the projective geometry 5*(V) satisfies the following. 

1) P is injective, with inverse given by 

p-i(u) = unH 



2 ) 

3 ) 

4 ) 

5 ) 

6 ) 
7 ) 



im(P) is the set of all subspaces of V that are not contained in 
the subspace K parallel to H 
X C Y if and only if P(X) C P(Y) 

If X- are flats in H with nonempty intersection, then 

K n x,)= n p(Xi) 

icK ieK 

For any collection of flats in H, 

P(. V Xi)=.e p(Xi) 

1 € K 1 e K 

P preserves dimension, in the sense that 



prfnn(P(X)) = dim{X) 

X II Y if and only if one of P(X) fl K and P(Y) fl K is contained 
in the other. 



Proof. To prove part (1), let x-f-S be a flat in H. Then x G H, and 
so H = x + K, which implies that S C K. Note also that P(x + S) = 
(x) -h S, and 

z G P(x -f S) n H = ((x) 4-S)n(x-fK) => z = rx + s = x + k 

for some s G S, k G K and r G F. This implies that (1 — r)x G K, 
which implies that either x G K or r = 1. But x G H implies x ^ K, 
and so r = 1, which implies that z = x + sGx + S. In other words, 

P(x + S)nHCx + S 

Since the reverse inclusion is clear, we have 

P(x + S)nH=x + S 

This establishes (1). 

To prove (2), let U be a subspace of V that is not contained in 
K. We wish to show that U is in the image of Note first that 
since U (f K, and dim(K) = dim{Y) — 1, we have U 4- K = V, and so 

dim{\] n K) = dim{\J) + dim{K) — dim(U + K) = dim(\J) — 1 

Now, let 0 ^ X G U — K. Then 
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x^K=j^(x)-hK = V 

rx + k G H for some 0/rGF, kGK=>rxGH 

Thus, rx G U n H for some 0 / r G F. Hence, the flat rx + (U H K) 

lies in H, and 

dim{rx+{V n K)) = rfmi(U fl K) = rfnn(U) - 1 

which implies that P(rx+ (U fl K)) = (rx) -f (U fl K) lies in U, and 

has the same dimension as U. In other words, 

P(rx + (U n K)) = (rx) + (U fl K) = U 

We leave proof of the remaining parts of the theorem as exercises. I 

EXERCISES 

1. Show that if x^,...,x^^ G V, then the set S = {^rjXj | Erj = 0} is 
a subspace of V. 

2. Prove that Aw//{xj, . . . ,Xj^} = Xj -f- (x 2 ~ Xj, . . . ,x^^ ~ x^}. 

3. Prove that the set X {(0,0), (1,0), (0,1)} in (F 2 )^ is closed 
under the formation of lines, but not affine hulls. 

4. Prove that a flat contains the origin 0 if and only if it is a 
subspace. 

5. Prove that a flat X is a subspace if and only if for some x G X 
we have rx G X for some 1 / r G F. 

6. Show that the join of a collection C = {xj -f SJ i G K) of flats in 
V is the intersection of all flats that contain all flats in C. 

7. Is the collection of all flats in V a lattice under set inclusion? If 
not, how can you “fix” this? 

8. Prove that if di7n(X) = dim(Y) and X || Y then S = T, where 
X =: X -f S and Y = y -j- T. 

9. Suppose that X = x -f S and Y = y + T are disjoint hyperplanes 
in V. Show that S = T. 

10. (The parallel postulate) Let X be a flat in V, and v ^ X. Show 
that there is exactly one flat containing v, parallel to X, and 
having the same dimension cts X. 

11. a) Find an example to show that the join XVY of two flats 

may not be the set of all lines connecting all points in the 
union of these flats. 

b) Show that if X and Y are flats with X fl Y ^0, then 
X V Y is the union of all lines xy where x G X and y G Y. 

12. Show that if X||Y and XflY=:0 then rfmi(XVY) = 
max{rfim(X),rfim(Y)} + 1. 

13. Let dim(Y) = 2. Prove the following. 

a) The join of any two distinct points is a line. 
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b) The intersection of any two nonparallel lines is a point. 

14. Let dim(y) = 3. Prove the following. 

a) The join of any two distinct points is a line. 

b) The intersection of any two nonparallel planes is a line. 

c) The join of any two lines whose intersection is a point is a 
plane. 

d) The intersection of two coplanar nonparallel lines is a point. 

e) The join of any two distinct parallel lines is a plane. 

f) The join of a line and a point not on that line is a plane. 

g) The intersection of a plane and a line not on that plane is a 
point. 

15. Prove that fV— »V is an affine transformation if and only if f = 
T o T^ for some w G V and r G L(V). 

16. Verify the group-theoretic remarks about the group 
homomorphism (f):AfJ{Y)-^SL{W), and the subgroup Trans{V) of 

Am)- 
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The Umbral Calculus 



Contents: Formal Power Series, The Umbral Algebra, Formal Power 
Series as Linear Operators, Sheffer Sequences, Examples of Sheffer 
Sequences, Umbral Operators and Umbral Shifts, Continuous 
Operators on the Umbral Algebra, Operator Adjoints, Automorphisms 
of the Umbral Algebra, Derivations of the Umbral Algebra, Exercises, 

In this chapter, we give a brief introduction to a relatively new 
subject, called the umbral calculus. This is an algebraic theory used to 
study certain types of polynomial functions that play an important role 
in applied mathematics. We give only a brief introduction to the 
subject — emphasizing the algebraic 2 ispects rather than the applications. 
For more on the umbral calculus, we suggest The Umbral Calculus, by 
Roman [1984]. 



Formal Power Series 

We begin with a few remarks concerning formal power series. Let 
^ denote the algebra of formal power series in the variable t, with 
complex coefficients. Thus, ^ is the set of all formal sums of the form 

oo 

(16.1) f(t) = 

k=0 

where a^^ E C. Addition and multiplication are purely formal 
oo oo oo 

(^k + 

k=0 k=0 k=0 



and 
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The order o(f) of f is the smallest exponent of t that appears 
with a nonzero coefficient. The order of the zero series is +oo. A series 
f hcis a multiplicative inverse, denoted by f”^, if and only if o(f) = 0. 
We leave it to the reader to show that 

o(fg) = o(f) + o(g) 

and 

o(f+g) >min{o(f),o(g)} 

If fj^ is a sequence in 5 with o(fj^)— ^oo as k— ►O, then for any 
series oo 

g(t) = 

k=o 

we may form the series 

c» 

h(t) = E Vk(t) 

k=0 

This sum is well-defined since the coefficient of each power of t is a 
finite sum. In particular, if o(f) > 1, then o(f^)~^oo, and so the 
composition 

oo 

(gof)(t) = g(f(t))= Eb/W 

k=0 

is well-defined. It is easy to see that o(g of) = o(g)o(f). 

If o(f) = 1, then f has a compositional inverse, denoted by f 
and satisfying (f of)(t) = (f of)(t) = t. A series f with o(f) = 1 is 
called a delta series. 

The sequence of powers of a delta series f forms a 

pseudobasis for 5, in the sense that for any g 6 *1, there exists a 
unique sequence of constants aj^ for which 

oo 

g(t) = 

k=0 

Finally, we note that the formal derivative of the series (16.1) is 
given by oo 

5,f(t) = f(t)=EKt'^-' 

k=l 

The operator is a derivation^ that is, 

^t(fg) = ^t(f)g+f^t(g) 
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The Umbral Algebra 

Let = C[x] denote the algebra of polynomials in a single 
variable x over the complex field. One of the starting points of the 
umbral calculus is the fact that any formal power series in ‘J can play 
three different roles — as a formal power series, as a linear functional on 
and cLS a linear operator on Let us first explore the connection 
between formal power series and linear functionals. 

Let 5^* denote the vector space of all linear functionals on 
Note that is the algebraic dual space of as defined in 

Chapter 2. It will be convenient to denote the action of L E ‘iP* on 
p(x)G^ by 

(L I P(x)) 

The vector space operations on then take the form 
(L + M 1 p(x)) = (L I p(x)) + (M I p(x)} 

and 

(rL I p(x)) = r{L | p(x)), r 6 C 



Note also that since any linear functional on ^P is uniquely determined 
by its values on a basis for 5^, L G is uniquely determined by the 
values (L | x^^) for n > 0. 

Now, any formal series in ^ can be written in the form 

k=o^‘ 

and we can use this to define a linear functional f(t) by setting 



(f(t) I x") = a„ 



for n > 0. In other words, the linear functional f(t) is defined by the 
condition 



oo 

f(‘) = E 



k=0 



(f(0 I x*") k 
k! 



Note in particular that 

(1“ I y^") = ■>!«„, k 



where 



is the Kronecker delta function. This implies that 

(tk|p(x)) = pW(0) 



and so t^ is the functional “kth derivative at 
evaluation at 0. 

As it happens, any linear functional L G ‘iP’^ 
To see this, we simply note that if 



oo 



fL(‘) = E 

k=0 



(L I x’^} k 



0.” Also, t^ is 
has the form f(t). 
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then 

(fL(t)|x") = (L|x«) 

for all n > 0, and so as linear functionals, L = fL(t). 

Thus, we can define a map by <f)(L) = fL(t). 



Theorem 16.1 The map defined by <^(L) = fL(t) is a vector 

space isomorphism from ?P* onto 

Proof. To see that (j) is injective, note that 

fj^(t) = fjviCO ^ (L I = (M I x^} for all n > 0 => L = M 

Moreover, the map <j) is surjective, since for any f G the linear 
functional L = f(t) has the property that <j){L) = fL(t) = f(t). Finally, 






k=0 
oo /T I 






(M|x'‘)^ 



k=0 



k=0 



From now on, we shall identify the vector space ^P* with the 
vector space 5, using the isomorphism ^5. Thus, we think of 

linear functionals on ^ simply as formal power series. The advantage 
of this approach is that ^ is more than just a vector space — it is an 
algebra. Hence, we have automatically defined a multiplication of 
linear functionals, namely, the product of formal power series. The 
algebra when thought of as both the algebra of formal power series 
and the algebra of linear functionals on is called the umbral algebra. 

Let us consider an example. 



Example 16.1 For a G C, the evaluation functional G is defined 

by 

(fa I p(x)) = p(a) 

In particular, {e^ | x") = a“, and so the formal power series 
representation for this functional is 






.at 



k=0 "■ k=0 ■ 

which is the exponential series. If e^^ is evaluation at b, then 



and so the product of evaluation at a and evaluation at b is 
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evaluation at a + b. D 



When we are thinking of a delta series f G ^ as a linear 
functional, we refer to it as a delta functional. Similarly, an invertible 
series f G ‘SF is referred to as an invertible functional. Here are some 
simple consequences of the development so far. 



Theorem 16.2 

1) For any f G ^ , 

2) For any p G 

3) For any f,g G 






f(‘) = E 



k=0 



k! 



pw = E^^'‘‘‘ 



k>0 



(f(t)g(t) I x") = 5] j; 



_ |x>‘}(g(t)|x"-'^}t'‘ 

k=0 

4) o(f(t)) > deg p(x) =» (f(t) I p(x)) = 0 

5) If o(fj^) = k for all k > 0, then 

{ pW)= I] I pW) 

k=0 k > 0 

where the sum on the right is a finite one. 

6) If o(fj^) = k for all k > 0, then 

(fk(t) I p(x)> = (fk(t) I q(x)> for all k > 0 

7) If deg Pjj(x) = k for all k > 0, then 

(f(t) 1 Pk(x)> = (g(t) I Pk(x)) for all k>0 =i^ f(t) = g(t) 
Proof. We prove only part (3). Let 

2° b; . 



p(x) = q(x) 



Then 



and g(t)=X)]rf^ 

OO / ^ m , \ . 

f(t)g(t) = £ (i^£(k)^kbm-k)f’ 



m=0 * k=0 

and applying both sides of this (as linear functionals) to gives 

n 

TnU u 

-k 



(f(t)g(t) I x«) = £(jj)akbj. 



k=0 



The result now follows from the fact that part (1) implies aj^ = 
(f(t)|x*"> and b„_k = (g(t) I x"-*'). I 
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We can now present our first ‘‘umbral” result. 

Theorem 16.3 For any f(t) G ^ and p(x) G ‘J, 

(f(t) I xp(x)) = (atf(t) I p(x)) 

Proof. By linearity, we need only establish this for p(x) = x^. But, if 



then 



k=0 



/ oo ^ I \ 



a 

~ ^(k-*i)!^k-l,n = ^n+1 = 



(f(t)|x"+i) 



I 



Let us consider a few examples of important linear functionals and 
their power series representations. 

Example 16.2 

1) We have already encountered the evaluation functional e^^, 
satisfying 

(e^‘ I p(x)) = p(a) 

2) The forward difference functional is the delta functional e^*^ ~ 1, 
satisfying 

(e^‘-l |p(x)) = p(a)-p(0) 

3) The Abel functional is the delta functional te^*, satisfying 

(te^* I p(x)} = p'(a) 

4) The invertible functional (1 — 1)~^ satisfies 

» OO 



{(i-trip(x))= 



p(u)e ^du 



as can be seen by setting p(x) = x*\ and expanding the expression 

5) To determine the linear functional f satisfying 



(f(0IPW)= f p(u)du 

•1 n 



we observe that 






k+1 



k=0 









e ^^-1 
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The inverse t/(e^^ — 1) of this functional is associated with the 
so-called Bernoulli polynomials^ which play a very important role 
in mathematics and its applications. In fact, the numbers 

are known as the Bernoulli numbers. D 



Formal Power Series as Linear Operators 

We now turn to the connection between formal power series and 
linear operators on 9. Let us denote the k-th derivative operator on ^ 
by t*'. Thus, 

t^p(x) = P^^^(x) 

We can then extend this to formal series in t 



( 16 . 2 ) 



k=0 



by defining the linear operator f(t):^-^‘iP by 

f(t)p(x) = £p[t*'p(x)] = 

k=o k>0 

the latter sum being a finite one. Note in particular that 
(16.3) f(t)x"= 

k=0 

With this definition, we see that each formal power series f G 
plays three roles in the umbral calculus, namely, as a formal power 
series, as a linear functional, and as a linear operator. The differing 
notations (f(t) | p(x)) and f(t)p(x) will make it clear whether we are 
thinking of f as a functional or as an operator. 

It is important to note that f = g in if and only if f = g as 
linear functionals, which holds if and only if f = g as linear operators. 
It is also worth noting that 

[f(t)g(t)]p(x) = f(t)[g(t)p(x)] 

and so we may write f(t)g(t)p(x) without ambiguity. In addition, 
f(t)g(t)p(x) = g(t)f(t)p(x) 
for all f,g G ^ and p G ‘iP. 

When we are thinking of a delta series f as an operator, we call it 
a delta operator. The following theorem describes the key relationship 
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between linear functionals and linear operators of the form f(t). 

Theorem 16.4 If f,g G then 

(f(t)g(t) I P(x)) = {f(t) I g(t)p(x)) 
for all polynomials p(x) G 
Proof. If f has the form (16.2), then by (16.3), 

(16.4) (t° I f(t)x-) = (t° I g(“) aj,x"-k) = a„ = (f(t) | x") 

By linearity, this holds for replaced by any polynomial p(x). 

Hence, applying this to the product fg gives 

(f(t)g(t) I p(x)) = (t° I f(t)g(t)p(x)) 

= I f(t)[g(t)p(x)]) = (f(t) I g(t)p(x)} I 

Equation (16.4) shows that applying the linear functional f(t) is 
equivalent to applying the operator f(t), and then following by 
evaluation at x = 0. 

Here are the operator versions of the functionals in Example 16.2. 

Example 16.3 

1) The operator e^^ satisfies 

oo n y. V 

eatxii _ ^ a ^ / n j — (x + a)^ 

k=0^‘ k=0^^^ 

and so 

e^V(x) = p(x + a) 

for all p G Thus e^^ is a translation operator. 

2) The forward difference operator is the delta operator e^^ — 1, 
where 

(e^‘ - l)p(x) = p(x + a) - p(a) 

3) The Abel operator is the delta operator te^^, where 

te^‘p(x) = p'(x + a) 

4) The invertible operator (1 —t)”^ satisfies 

A 00 

(1 - t)“^p(x) = p(x -f u)e-'^du 

J 0 

5) The operator (e^^ ““I)/! is easily seen to satisfy 

— T^P(x)= p(u)du 

X 



D 
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We have seen that all linear functionals on ^ have the form f(t), 
for f G However, not all linear operators on have this form. To 
see this, observe that 

deg [f(t)p(x)] < deg p(x) 

but the linear operator 0:*?— defined by <^(p(x)) = xp(x) does not 
have this property. Proof of the following characterization of operators 
that do have the form f(t) can be found in Roman [1984]. 

Theorem 16.5 The following are equivalent for a linear operator 

1) r has the form f(t), that is, there exists an f G ^ for which 
r = f(t), as linear operators. 

2) r commutes with the derivative operator, that is, rt = tr. 

3) r commutes with any delta operator g(t), that is, rg(t) = g(t)r. 

4) r commutes with any translation operator, that is, re^^ = e^*r. I 

Sheffer Sequences 

We can now define the principal object of study in the umbral 
calculus. When referring to a sequence Sj^(x) in *iP, we shall always 
imply that deg Sj^(x) = n for all n > 0. The proof of the following 
result is straightforward, but in the interest of space, it will be omitted. 

Theorem 16.6 Let f be a delta series, let g be an invertible series, 
and consider the geometric sequence 

g, gf, gf^, gf^,. . . 

in Then there is a unique sequence Sj^(x) in ^ satisfying the 
orthogonality conditions 

(16.5) (g(t)f^(t) 1 s„(x)) = n!5„ 

for all n,k > 0. ■ 

Deflnition The sequence s^{x) in (16.5) is called the Sheffer sequence 
for the ordered pair (g(t),f(t)). We shorten this by saying that s^{x) 
is Sheffer for (g(t),f(t)). D 

Two special types of Sheffer sequences deserve explicit mention. 

Definition The Sheffer sequence for a pair of the form (l,f(t)) is called 
the associated sequence for f(t). The Sheffer sequence for a pair of the 
form (g(t),t) is called the Appell sequence for g(t). D 
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Before considering examples, we wish to describe several 
characterizations of Sheffer sequences. First, we require a key result. 



Theorem 16.7 (The expansion theorems) Let Sj^(x) be Sheffer for 

1) For any h E 

k=0 

2) For any p € 



k > 0 

Proof. Part (1) follows from parts (5) and (7) of Theorem 16.2, since 
>(h(t) |sk(x)) 






k=0 



(h(t) I S),(x)) 

k! ' 



k=o 

= (h(t)|sjx)) 

Part (2) follows in a similar way from part (6) of Theorem 16.2. I 



We can now begin our characterization of Sheffer sequences, 
starting with the generating function. The idea of a generating function 
is quite simple. If rj^(x) is a sequence of polynomials, we may define a 
formal power series of the form 



g(t,x) = Y, 

k=0 



^kW .k 

k! 



This is referred to as the (exponential) generating function for the 
sequence r^^(x). (The term exponential refers to the presence of k! in 
this series. When this is not present, we have an ordinary generating 
function.) Since the series is a formal one, knowing g(t) is equivalent 
(in theory, if not always in practice) to knowing the polynomials r^^(x). 
Moreover, a knowledge of the generating function of a sequence of 
polynomials can often lead to a deeper understanding of the sequence 
itself, that might not be otherwise easily accessible. For this reason, 
generating functions are studied quite extensively. 

For the proofs of the following characterizations, we refer the 
reader to Roman [1984]. 



Theorem 16.8 (Generating function) 

1) Let Pn(x) be the associated sequence for f(t). The generating 
function of Pj^(x) is 
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2 ) 



gyf(t) ^ ^ Pk(y) 

where f(t) is the compositional inverse of f(t). 

Let Sjj(x) be Sheffer for (g(t), f(t)). The generating function of 
Sn(x) is 



1 yi(t) 

g(f(t)) 



_ vSk(y) 

“ k! 

k=0 



t*' 



I 



Theorem 16.9 (Conjugate representation) 

1) A sequence Pn(x) is the associated sequence for f(t) if and only 

if ^ 1 _ 

p«w = Ep<'w i*> 

k=0*^' 

2) A sequence Sjj(x) is Sheffer for (g(t),f(t)) if and only if 

k=0 



Theorem 16.10 (Operator characterization) 

1) A sequence Pj^(x) is the associated sequence for f(t) if and only 
if 

a) Pn(0) = ^n,O 

b) f(t)Pn(x) = np„_i(x) for n>0 

2) A sequence s^^(x) is Sheffer for (g(t),f(t)), for some g(t), if and 
only if 



f(t)s„(x) = ns„_i(x) 



for all n > 0. I 



Theorem 16.11 

1) (The binomial identity) A sequence Pi^(x) is the associated 
sequence for a delta series f(t) if and only if it is of binomial 
type, that is, if and only if it satisfies the identity 

Pn(x + y) = £(j)Pk(y)Pn-kW 

k=0 

for all y G C. 

2) (The Sheffer identity) A sequence s^^(x) is Sheffer for (g(t),f(t)), 
for some g(t) if and only if 

Sn(x + y) = £(U)pk(y)s„-kW 

for all y G C, where p^C^) is the associated sequence for f(t). I 
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Examples of Sheffer Sequences 

We can now give some examples of Sheffer sequences. While it is 
often a relatively straightforward matter to verify that a given sequence 
is Sheffer for a given pair (g(t),f(t)), it is quite another matter to find 
the Sheffer sequence for a given pair. The umbral calculus provides two 
formulas for this purpose, one of which is direct, but requires the 
usually very difficult computation of the series (f(t)/t)”^. The other is 
a recurrence relation that expresses each s^^(x) in terms of previous 
terms in the Sheffer sequence. Unfortunately, space does not permit us 
to discuss these formulae in detail. However, we will discuss the 
recurrence formula for associated sequences later in this chapter. 



Example 16.4 The sequence Pj^(x) = x^^ is the associated sequence for 
the delta series f(t) = t. The generating function for this sequence is 



oo k 



and the binomial identity is precisely that: 



(x + y)“= E(k)xV- 



Example 16.5 The lower factorial polynomials 

Wn = x(x-l)---(x-n + l) 

form the associated sequence for the forward difference functional 

f(t) =r — 1 

discussed in Example 16.2. To see this, we simply compute, using 
Theorem 16.10. Since (0 )q is defined to be 1, we have (0)^^ == 

Also, 

(e‘ - l)(x)n = (x + 1)„ - (x)„ 

= (x + l)x(x — !)• • '(x — n + 2) — x(x — !)• • -(x — n + 1) 

= x(x - !)• • -(x - n + 2)[(x + 1) - (x - n + 1)] 

= nx(x-l)---(x-n + 2) 

= n(x)„-i 



The generating function for the lower factorial polynomials is 



^ylog(i+t) _ 



(y)ki.k 

k! 



which can be rewritten in the more familiar form 
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(1 

Of course, this is a formal identity, so there is no need to make any 
restrictions on t. The binomial identity in this Ccise is 

('' + y)n= £(k)Wk(y)n-k 

which can also be written in the form 

k=0 

This is known as the Vandermonde convolution formula. 

Example 16.6 The Abel polynomials 

Aj^(x;a) = x(x ~ an)^^”"^ 

form the associated sequence for the Abel functional 

f(t) = te^‘ 

also discussed in Example 16.2. We leave verification of this to the 
reader. The generating function for the Abel polynomials is 

eyf(t) ^ ^ y(y-ak)'^-\ k 

Taking the formal derivative of this with respect to y gives 

= V '‘(y 7 tl. 

k=0 

which, for y = 0, gives a formula for the compositional inverse of the 
series f(t) = te^^, 

f(t) = 



H (k-1)! 



Example 16.7 The famous Hermite polynomials 
Appell sequence for the invertible functional 



Hj^(x) form the 



g(t) = e' 



t^/2 



We ask the reader to show that Sj^(x) is the Appell sequence for g(t) 
if and only if Sj^(x) = g(t)~^x^^. Using this fact, we get 






k>0 



The generating function for the Hermite polynomials is 
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gyt-t2/2 _ ^ Hk(y) ^k 
and the Sheffer identity is 

‘•n(*+y) = E©Hk(%”‘'‘ 

We should remark that the Hermite polynomials, as defined in the 
literature, often differ from our definition by a multiplicative 
constant. D 

Example 16.8 The well-known and important Laguerre polynomials 
of order a form the Sheffer sequence for the pair 

It is possible to show (although we will not do so here) that 

k=0 

The generating function of the Laguerre polynomials is 
1 yt/(t-i) _ ^ 4*^ V) . k 

As with the Hermite polynomials, some definitions of the Laguerre 
polynomials differ by a multiplicative constant. D 

We presume that the few examples we have given here indicate 
that the umbral calculus applies to a significant range of important 
polynomial sequences. In Roman [1984], we discuss approximately 30 
different sequences of polynomials that are (or are closely related to) 
Sheffer sequences. 



Umbral Operators and Umbral Shifts 

We have now established the basic framework of the umbral 
calculus. As we have seen, the umbral algebra plays three roles — as the 
algebra of formal power series in a single variable, as the algebra of all 
linear functionals on *5*, and as the algebra of all linear operators on 
that commute with the derivative operator. Moreover, since ^ is an 
algebra, we can consider geometric sequences in ^ 

g, gf, gf^, gf^,. . . 

where o(g) = 0 and o(f) = 1. We have seen by example that the 
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orthogonality conditions 

(g(t)f^(t) 1 s„(x)) = n!^„ ,, 

define important families of polynomial sequences. 

While the machinery that we have developed so far does unify a 
number of topics from the classical study of polynomial sequences (for 
example, special cases of the expansion theorem include Taylor’s 
expansion, the Euler- MacLaur in formula and Boole’s summation 
formula), it does not provide much new insight into their study. Our 
plan now is to take a brief look at some of the deeper results in the 
umbral calculus, which center around the interplay between operators 
on ^ and their adjoints, which are operators on the umbral algebra 
5 = 

We begin by defining two important operators on ^ associated 
to each Sheffer sequence. 

Definition Let Sj^(x) be Sheffer for (g(t),f(t)). The linear operator 
A defined by 

= ®n(^) 

is called the Sheffer operator for the pair (g(t),f(t)), or for the sequence 
Sjj(x). If Pjj(x) is the associated sequence for f(t), the Sheffer operator 

Mx") = P„(x) 

is called the umbral operator for f(t), or for Pj^(x). D 

Definition Let Sj^(x) be Sheffer for (g(t),f(t)). The linear operator 
£:^— defined by 

^g,fK(x)] = Sn+i(x) 

is called the Sheffer shift for the pair (g(t),f(t)), or for the sequence 
Sj^(x). If Pj^(x) is the associated sequence for f(t), the Sheffer operator 

^f[PnW] = Pn+l(^) 

is called the umbral shift for f(t), or for Pn(x). D 

We will confine our attention in this brief introduction to umbral 
operators and umbral shifts, rather than the more general Sheffer 
operators and Sheffer shifts. It is clear that each Sheffer sequence 
uniquely determines a Sheffer operator and vice-versa. Hence, knowing 
the Sheffer operator of a sequence is equivalent to knowing the 
sequence. 
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Continuous Operators on the Umbral Algebra 

It is clearly desirable that an operator T G on the umbral 

algebra pass under infinite sums, that is, 

.00 .00 

(16.6) T(£akfk(t)) = £akT[f^(t)] 

^k=0 k=0 

whenever the sum on the left is defined, which is precisely when 
-^00 as k— ^ 00 . Not all operators on 5 have this property, 
which leads to the following definition. 

Deflnition A linear operator T on the umbral algebra 5 is 
continuous if it satisfies (16.6). D 

The term continuous can be justified by defining a topology on 5. 
However, since no additional topological concepts will be needed, we 
will not do so here. Note that in order for (16.6) to make sense, we 
must have o(T[fj^(t)])— »cx). It turns out that this condition is also 

sufficient. 



Theorem 16.12 A linear operator T on ^ is continuous if and only if 

(16.7) o(y^~ => 0(T(g)^oo 

Proof. The necessity is clear. Suppose that (16.7) holds, and that 
oik) — >^ 00 . For any m > 0, we have 

(16.8) (t I = \T £ a^fk(t) I + (t ^ a^fk(t) 

k=0 k=0 k > m 



Since K^k > m^k^k(0)“^®®5 (^^*^) implies that we may choose m large 
enough so that 



as well as 



“(t £ akfk(‘>) 

k>m 



>n 



o(T[fj^(t)]) > n for k > m 



Hence, (16.8) gives 

.00 ...m ...m .. 

(t E ^kfk(t) = (t E ^kfk(t) X") = ( E ^kT[fk(t)] X") 

k=0 k=0 k=0 



= (£akT[tk(t)l|x“) 

k=0 

which implies the desired result. I 
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Operator Adjoints 

If is a linear operator on then its (operator) adjoint 

is an operator on 5^* = ^ defined by 

r^[h(t)] = h(t) o T 

In the symbolism of the umbral calculus, this is 

(r.^h(t) 1 p(x)) = (h(t) 1 rp(x)) 

(We have reduced the number of parentheses used to aid clarity.) 

Let us recall the basic properties of the adjoint from Chapter 3. 



Theorem 16.13 For t,(t G i(^P), 

1 ) {r + a)^ = T^ + a^ 

2) (rr)^ = rr^ for any r G C 

3) (ra)^ = 

4) for invertible r G £(^P) I 

Thus, the map that sends to its adjoint 

r^:^— is a linear transformation from £(*0^) to X(^). Moreover, 
since = 0 implies that (h(t) | rp(x)) = 0 for all h(t) G ^ and 
p(x) G ^P, which in turn implies that r = 0, we deduce that (j) is 
injective. The next theorem describes the range of (f>. 



Theorem 16.14 A linear operator T G is the adjoint of a linear 

operator L G A(5*) if and only if T is continuous. 

Proof. First, suppose that T = for some r G L(^). If 
o(fk(t))— ^oo, then for any n > 0, there is a for which 

k > => o(fj^(t)) > deg r(x^) for all 0 < i < n 

Hence, 



k > kj^ => (r^fj^(t) I X*) = (fi^(t) I rx^) = 0 for all 0 < i < n 
=> o(r\(t)) > n 



which shows that o(r^fj^(t))— >oo, and hence that is continuous. 

For the converse, assume that T is continuous. We can define a 
linear operator r on ‘SP* by setting 



rx 



”=E 

k>0 



(Tt*^ I w 

k! "" 



This makes sense since ^(Tt*^)—^^ as k— >oo, and so the sum on the 
right is a finite one. Then 



(r^t™ 1 x") = (t“ I rx") = 



k>0 



(Tt*' 1 x") 
k! 



(t"* I x’^) = (Tt™ I x") 
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which implies that Tt”^ = r^t^^ for all m > 0. Finally, since T and 
are both continuous, we have T = r^. I 

Automorphisms of the Umbral Algebra 

Figure 16.1 shows the map which is an isomorphism from the 
vector space 1(^P) onto the space of all continuous linear operators on 
?F. We are interested in determining the images of the set of all umbral 
operators, and the set of all umbral shifts, under this isomorphism. 





Isomorphism 



Figure 16.1 



Let us begin with umbral operators. Suppose that is the 

umbral operator for the associated sequence Pn(x), associated to the 
delta series f(t) E *3F. Then 

I X-) = I Afx") = I p„(x)) = = (t*' | x“) 

for all k and n. Hence, A£f(t)^ = t^, which implies, since A£ is 
continuous, that 

Xft^ = {{t)^ 

More generally, for any h(t) G 

(16.9) A£^h(t) = h(f(t)) 

In words, A£ is composition by f(t). 

From (16.9), we deduce that A£ is a vector space isomorphism, 
and that 

A£-[g(t)h(t)] = g(f(t))h(f(t)) = A£-g(t)A£"h(t) 

Hence, A£ is an automorphism of the umbral algebra T. It is a 
pleasant fact that this characterizes umbral operators. The first step in 
the proof of this is the following, whose proof is left as an exercise. 
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Theorem 16.15 If T is an automorphism of the umbral algebra, then 
T preserves order, that is, o(Tf(t)) = o(f(t)). In particular, T is 
continuous. I 

Theorem 16.16 A linear operator A on is an umbral operator if 
and only if its adjoint is an automorphism of the umbral algebra 
Moreover, if A^ is an umbral operator, then 

Af-h(t) = h(f(t)) 

for all h(t) G In particular, A£f(t) = t. 

Proof. We have already shown that the adjoint of A£ is an 
automorphism satisfying (16.9). For the converse, suppose that A^ is 
an automorphism of 5. Theorem 16.15 implies the existence of a 
unique delta series f(t) for which A^f(t) = t. If Pj^(x) is the 

associated sequence for f(t), then 

(f(t)k I Ax-) = (A><f(t)k I X-) = ([A"f(t)]k I X-) 

= (t’^|x") = n!5„,k = (f(t)Np„W) 

and so part (6) of Theorem 16.2 implies that Ax^^ = Pn(^)* Thus, A is 
an umbral operator. I 

Theorem 16.16 allows us to fill in one of the blank boxes on the 
right side of Figure 16.1, as shown in Figure 16.2. 




Let us see how we might use Theorem 16.16 to advantage in the 
study of associated sequences. Since the set Aui{^) of all 

automorphisms of 5 is a group under composition, so is the set of 
umbral operators. More specifically, let 

Aj-:x"-^p„(x) and Ag:x--^q„(x) 
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be umbral operators. Then 

(\oA/ = AfoA- 

is an automorphism of ‘J, and so o A£ is an umbral operator. In 
fact, since 

(Ag o Af)"f(g(t)) = A^ o A^f(g(t)) = Af"f(t) = t 
we deduce that Ag o A£ = A£q g. Also, since 

At; O Af “ A- 7 ” Af — L 

f ^ fof ^ 

we have A^^ = Aj. 

Now, if n 

Pn(x) = £p„,kx‘' 
k=0 

then Ag o A£ is the umbral operator for the associated sequence 

n n 

(Ag o Af)x" = AgP„(x) = XlPn.kV*^ = Z]Pn,kqk(x) 
k=0 k=0 

This sequence, denoted by 

(16-10) Pn(q(x))= £pn,k%(x) 

k=0 

is called the umbral composition of Pj^(x) with qji(x). Let us 
summarize. 

Theorem 16.17 Let Pj^(x) and qj^(x) be associated sequences, with 
umbral operators A£ and Ag, respectively. 

1) A oAf=Afog and Af ^ = Aj 

2) The set of associated sequences forms a group under umbral 
composition, as defined by (16.10). In particular, the umbral 
composition Pj^(q(x)) is the associated sequence for the 
composition fog. The identity is the sequence x*^, and the 
inverse of Pj^(x) is the associated sequence for the compositional 
inverse f(t). I 

Derivations of the Umbral Algebra 

We have seen that an operator on is an umbral operator if and 
only if its adjoint is an automorphism of Now suppose that 

^£ € £(^) is the umbral shift for the associated sequence P^C^)? 
associated to the delta series f(t) € Then 

(6lff(t)*' I pjx)) = I Vn(x)) = (1(1)'' I Pn+l(x)) 

= (n+l)!5„+i,k = (n+l)n!6„_k-i = (kf(t)’‘“^ | p„(x)) 
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and so 

(16.11) = kf(t)’‘-^ 

This implies 

(16.12) ^f[f(t)’^(ty] = ^f[f(t)’']f(t)j + f(t)*^0f^[f(t)j] 
and further, by continuity, that 

(16.13) ^?[g(t)h(t)] = [0rg(t)]h(t) + g(t)[^?g(t)] 

Let us pause for a definition. 

Definition Let A be any algebra. A linear operator 5 on is a 
derivation if 

5(ab) = (9a)b + a5b 

for all a,b £ A, U 

Thus, we have shown that the adjoint of an umbral shift is a 
derivation of the umbral algebra ‘T. Moreover, the expansion theorem 
and (16.11) show that 0^ is surjective. As with umbral operators, this 
characterizes umbral shifts. First we need a preliminary result on 
surjective derivations. 

Theorem 16.18 Let 5 be a surjective derivation on the umbral algebra 
Then 5c = 0 for any constant c G ^ and o(5f(t)) = ■” 1? if 

o(f(t)) > 1. In particular, d is continuous. 

Proof. We begin by noting that 51 = 51^ = 51 + 51 = 251, and so 
5c = c5l = 0 for all constants c G Since 5 is surjective, there 
must exists an h(t) G ‘if for which 

5h(t) = 1 

Writing h(t) = hQ + th^(t), we have 

1 = 5[ho + thj(t)] = (5t)hj(t) + t5hj(t) 

which implies that o(5t) = 0. Finally, if o(h(t)) = k > 1, then h(t) = 
t^hj(t), where o(h|(t)) = 0, and so 

o[ah(t)] = o[5t’^hi(t)] = o[t’‘ah(t) + kt*‘-^hi(t)5t] = k - 1 I 

Theorem 16.19 A linear operator 0 on is an umbral shift if and 
only if its adjoint is a surjective derivation of the umbral algebra 
Moreover, if 0^ is an umbral shift, then 0^ = is derivation with 
respect to f(t), that is, 



9f{{t)^ = kf(t)>'-l 
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for all k > 0. In particular, f(t) = 1. 

Proof. We have already seen that 0f is derivation with respect to 
f(t). For the converse, suppose that 6^ is a surjective derivation. 
Theorem 16.18 implies that there is a delta functional f(t) such that 
^^f(t) = 1. If Pjj(x) is the associated sequence for f(t), then 

1 9p.,W) = I P„W> = I p„(x)) 

= I P„W> = (n+l)!S„+,,k = I P„+,(x)) 

Hence, = Pn-fi W? 6 = 0^ is the umbral shift for 

Pn(x)- 1 

Figure 16.2 is now justified. Let us summarize. 

Theorem 16.20 The isomorphism from L(^) onto the continuous 
linear operators on ‘J is a bijection from the set of all umbral 
operators to the set of all automorphisms of as well as a bijection 
from the set of all umbral shifts to the set of all surjective derivations 
on 7. I 

We have seen that the fact that the set of all automorphisms on 
^ is a group under composition shows that the set of all associated 
sequences is a group under umbral composition. The set of all 

surjective derivations on 5 does not form a group. However, we do 
have the chain rule for derivations! 

Theorem 16.21 (The chain rule) Let and be surjective 
derivations on Then 

ag = (y(t))5f 

Proof. This follows from 

= (agf(t))5ff(t)k 

and so continuity implies the result. I 

The chain rule leads to the following umbral result. 

Theorem 16.22 If 9^ and 9^ are umbral shifts, then 

9j = e^o afg(t) 

Proof. The chain rule gives 



and so 
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(h(t) 1 0fp(x)} = (6'fh(t) 1 p(x)} = ((afg(t))0^h(t) I p(x)) 

= (6»gh(t) I 5fg(t)p(x)) = (h(t) I o %(t)p(x)) 
for all p(x) G ^ and all h(t) G which implies the result. I 

We leave it as an exercise to show that 9gf(t) = [9£g(t)]"^. Now, 
by taking g(t) t in Theorem 16.22, and observing that 
and so 6 ^ is multiplication by x, we get 

= xSft = x[5jf(t)]-^ = x[f(t)]-^ 

Applying this to the associated sequence Pn(x) for f(t) gives the 
following important recurrence relation for Pj^(x). 

Theorem 16.23 (The recurrence formula) Let Pii(x) be the 
associated sequence for f(t). Then 

Pn+lW ■ 

Example 16.9 The recurrence relation can be used to find the 
associated sequence for the forward difference functional f(t) =e*^ — 1. 
Since f (t) = e^, the recurrence relation is 

P„+l(x) = xe-‘p„(x) = xp„(x - 1) 

Using the fact that Pq(x) = 1, we have 

Pj(x)=x, P 2 (x) =x(x- 1), P 3 (x) =x(x- l)(x-2) 

and so on, leading easily to the lower factorial polynomials 

PnW=x(^-l)---(^-n + l) = (x)„ D 

Example 16.10 Consider the delta functional 

f(t) = log(l -f-t) 

Since f(t) = e^ — 1 is the forward difference functional, Theorem 16.17 
implies that the associated sequence for f(t) is the inverse, 

under umbral composition, of the lower factorial polynomials. Thus, if 
we write 

= ZlS(n,k)x‘‘ 

k=0 

then 

k=0 

The coefficients S(n,k) in this equation are known as the Stirling 
numbers of the second kind and have great combinatorial significance. 
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In fact, S(n,k) is the number of partitions of a set of size n into k 
blocks. The polynomials called the exponential polynomials. 

The recurrence relation for the exponential polynomials is 

= x(l + t)«^„(x) = x((^„(x) + ^^(x)) 

Equating coefficients of x*^ on both sides of this gives the well-known 
formula for the Stirling numbers 

S(n+l,k) = S(n,k-l)+kS(n,k) 

Many other properties of the Stirling numbers can be derived by umbral 
means. D 



EXERCISES 

1. Prove that o(fg) = o(f) + o(g), for any f,g E 

2. Prove that o(f + g) > min{o(f),o(g)}, for any f,g G 

3. Show that any delta series has a compositional inverse. 

4. Show that for any delta series f, the sequence f^ is a pseudobasis. 

5. Prove that is a derivation. 

6. Show that f G ^ is a delta functional if and only if (f | 1} = 0 

and (f I x) / 0. 

7. Show that f G is invertible if and only if {f | 1) ^ 0. 

8. Show that (f(at) | p(x)) = (f(t) | p(ax)) for any a G C, f G ‘J and 

p G 

9. Show that (te^^ | p(x)) = P^(a) for any polynomial p(x) G 

10. Show that f = g in if and only if f = g as linear functionals, 
which holds if and only if f = g as linear operators. 

11. Prove that if Sj^(x) is Sheffer for (g(t),f(t)), then f(t)Sj^(x) = 
nSn_i(x). Hint Apply the functionals g(t)f^(t) to both sides. 

12. Verify that the Abel polynomials form the associated sequence for 
the Abel functional. 



13. 

14. 

15. 



Show that a sequence 
and only if Sj^(x) = g(t) 



s (x) is the Appell sequence for g(t) 



if 



If f is a delta series, show that the adjoint Xf of the umbral 

operator X^ is a vector space isomorphism of 

Prove that if T is an automorphism of the umbral algebra, then 



T preserves order, that is, o(Tf(t)) = o(f(t)). In particular, T is 



continuous. 



16. Show that an umbral operator maps associated sequences to 
associated sequences. 

17. Let Pj^(x) and qj^(x) be associated sequences. Define a linear 
operator a by a:pj^(x)-4qj^(x). Show that a is an umbral 
operator. 

18. Prove that if df and d are surjective derivations on 5, then 

V(t) = [%(t)]-'. 
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Accumulation point, 245 
adjoint, 175 
Hilbert space, 176 
operator, 77 

affine, combination, 317 
geometry, 315 
dimension of, 315 
group, 324 
hull, 318 
map, 322 
subspace, 43 
transformation, 322 
affinely independent, 322 
affinity, 322 
algebra, 46 

algebraically closed, 140, 220 
algebraically reflexive, 73 
annihilator, 74, 110 
Appolonius' identity, 173 
ascending chain condition, on ideals, 102 
on modules, 101 
associates, 21 



Barycentric coordinates, 322 
basis, dual, 70 
for a module, 90 



for a vector space, 37 
Hamel, 165 
Hubert, 165, 274 
ordered, 41 
orthogonal, 217 
orthogonal Hamel, 166 
orthonormal, 219 
orthonormal Hamel, 166 
standard, 37, 50, 93 
Bessel's identity, 167 
Bessel’s inequality, 167, 275, 277, 283 
best approximation, 271 
bijection, 5 
bilinearity, 158 
binomial identity, 339 

Cancellation law, 17 
canonical form, 6, 123 
Jordon, 142 
rational, 131 
canonical injection, 292 
canonical map, 72 
cardinality, 10 
cartesian product, 12, 294 
Cauchy sequence, 249 
Cauchy-Schwarz inequality, 159, 241, 264 
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chain rule, 350 
characteristic, 24 
characteristic equation, 139 
characteristic value (see eigenvalue) 
characteristic vector (see eigenvector) 
closed, 242 
closed ball, 242 
closure, 244 
codimension, 68 
complement, 33, 86 
orthogonal, 164 

congruent, modulo a subspace, 63 

conjugate linear map, 173 

conjugate linearity, 158 

conjugate space, 287 

convergence, 244 

convex set, 270 

coset (see also flat), 64, 315 

coset representative, 64, 316 

countable, 10 

countably infinite, 10 



Dense, 246 
derivation, 349 

diagonalizability, simultaneous, 156 
diagram, commutative, 293 
diameter, 258 
dimension, 39 
Hamel, 166 
Hilbert, 165, 284 
projective, 325 
direct product, 31 
direct sum, 32 
external, 32 

universal property of, 296 
internal, 33 
orthogonal, 169, 214 
direct summand, 33 
distance, 161 
divides, 20 
division algorithm, 3 
dot product, 158 
dual space, algebraic, 69, 211 
continuous, 287 



double, 72 



Eigenspace, 138 
eigenvalue, 137, 138 

algebraic multiplicity of, 143 
geometric multiplicity of, 143 
eigenvector, 138 
elementary divisors, 117, 128 
endomorphism, of modules, 90 
of vector spaces, 46 
epimorphism, of modules, 90 
of vector spaces, 46 
equivalence class, 5 
equivalence relation, 5 
Euclidean space, 158 
evaluation at v, 72 
extension map, 306 

exterior product, universal property of, 312 
exterior product space, 312 



Field, 23 
finite, 10 
flat, 315 

dimension of, 316 
generated by a set, 318 
hyperplane, 316 
line, 316 
parallel, 316 
plane, 316 
point, 316 

flat representative, 316 
form, bilinear, 205, 297 
discriminant of, 209 
rank of, 209 
universal, 222 
multilinear, 308 

formal power series, composition of, 330 
delta series, 330 
order of, 330 
Fourier coefficient, 167 
Fourier expansion, 167, 276, 282 
function, bijective, 5 
bilinear, 297 
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continuous, 248 
domain of, 4 
image of, 4 
injective, 4 
multilinear, 308 
n-linear, 308 
range of, 4 
restriction of, 5 
square summable, 285 
surjective, 5 

functional (see linear functional) 

functional calculus, 195 



Gaussian coefficient, 81 
generate, 35, 87 
generating function, 338 
Gram-Schmidt orthogonalization, 170, 171 
greatest lower bound, 9 
group, 15 
abelian, 15 
commutative, 15 
zero element of, 15 



Hamming distance, 258 
Hilbert space, 265 
total subset of, 274 
Holder's inequality, 241, 261 
homomorphism, of modules, 90 
of vector spaces, 46 
hyperbolic pair, 216 
hyperbolic plane, 216 
hyperbolic space, 216 
maximal, 233ff. 



Ideal, 18 
maximal, 21, 98 
order. 111 
prime, 105 
principal, 18 
index set, 49 

infinite series, absolutely convergence of, 268 
convergence of, 268, 277 



net convergence of, 277 
partial sum of, 268 
unconditional convergence of, 276 
injection, 4 

inner product, 157, 206 
standard, 158 
inner product space, 158 
integral domain, 17 
invariant, 6 
complete, 6 
complete system of, 6 
invariant factor, 118 
irreducible, 21 

isometric, metric spaces, 253 
metric vector spaces, 225 
isometry, of inner product spaces, 162, 264 
of metric spaces, 253 
of metric vector spaces, 225 
isomorphic, isometrically, 162, 264 
vector spaces, 48 
isomorphism, isometric, 162, 264 
of modules, 90 
of vector spaces, 46, 48 

Jordon block, 141 
Jordon canonical form, 142 

Kronecker delta function, 70 



Lattice, 31 
least upper bound, 9 
limit, 244 
limit point, 245 
linear combination, 28 
linear functional, 69 
Abel, 334 
delta, 333 
evaluation, 332 
forward difference, 334 
invertible, 333 
linear operator, 45 
Abel, 336 
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adjoint of, 77, 175, 176 


linearly independent, 35, 89 


delta, 335 


linearly ordered set, 9 


diagonalizable, 123 




direct sum of, 60 




forward difference, 336 


Matrix, adjoint of, 3 


Hermitian, 180, IZlff, 


alternate, 208 


involution, 155 


block, 129 


minimal polynomial of, 124 


block diagonal, 129 


nilpotent, 154 


change of basis, 53 


nonderogatory, 154 


column rank of, 40 


nonnegative, 197 


column space of, 40 


normal, 180, \%Sff. 


companion, 126 


orthogonal spectral resolution of, 194, 195 


congruent, 8, 208 


orthogonally diagonalizable, 179, 186jg^. 


conjugate transpose of, 177 


polar decomposition of, 199 


coordinate, 42 


positive, 197 


elementary, 2 


projection (see projection) 


equivalent, 7, 58 


self-adjoint, 180, 


Hermitian, 181 


Sheffer, 343 


leading entry, 2 


spectral resolution of, 153 


minimal polynomial of, 125 


spectrum of, 153 


normal, 181 


square root of, 197 


of a bilinear form, 208 


translation, 336 


of a linear transformation, 54 


umbral, 343 


orthogonal, 181 


unitary, 180, 


rank of, 41 


linear transformation, 45 


reduced row echelon form, 2 


adjoint of, 77, 175, 176 


row equivalent, 2 


bounded, 287 


row rank of, 40 


external direct sum of, 61 


row space of, 40 


image of, 48 


similar, 8, 59, 122 


kernel of, 48 


skew-Hermitian, 181 


matrix of, 54 


skew-symmetric, 1, 181 


nullity of, 48 


standard, 52 


operator adjoint of, 77 


symmetric, 1, 181 


orthogonal, 225 


trace of, 155 


determinant of, 227 


transpose of, 1-2 


rank of, 48 


unitary, 181 


reflection, 227 


maximal element, 9 


restriction of, 47 


maximal ideal, 21 


rotation, 227 


metric, 239 


symmetry, 227 


Euclidean, 240 


symplectic, 226 


sup, 240 


tensor product of, 303 


unitary, 240 


unipotent, 237 


metric space, 161, 239 


linearly dependent, 35, 89 


bounded subset of, 259 
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complete, 250 
complete subspace of, 250 
completion of, 254 
convergence in, 243 
dense subset of, 246 
distance between subsets in, 259 
separable, 246 
subspace of, 242 
metric vector space, 206 
anisotropic, 214 
group of, 225 
isometric, 225 
isotropic, 214 
nondegenerate, 206, 214 
nonsingular, 206, 214 
radical of, 213, 214 
totally isotropic, 214 
Minkowski space, 206 
Minkowski's inequality, 241, 261 
modular law, 43 
module, 84 
basis for, 90 
complement of, 86 
direct sum of, 86 
direct summand of, 86 
finitely generated, 87 
free, 91 

noetherian, 101 
primary. 111 
quotient, 97 
rank of, 93 
torsion, 108 
torsion element of, 95 
torsion free, 108 
monomorphism, of modules, 90 
of vector spaces, 46 

Natural map, 72 
neighborhood, open, 242 
net convergence, 277 
norm, 159, 161 
p-norm, 241 

normed linear space, 161 



Open, 242, 243 
open ball, 242 

operator (see linear operator) 
order. 111 
orthogonal, 164 
orthogonal complement, 164 
orthogonal geometry, 206 
orthogonal set, 164 
orthogonal transformation, 225 
determinant of, 227 
orthonormal set, 164 



Parallelogram law, 160, 264 
Parseval's identity, 167, 283 
partial order, 8 
partially ordered set, 8 
partition, 5 
blocks of, 5 
permutation, 310 
parity of, 311 
sign of, 311 
p-norm 

polarization identities, 161 
polynomial(s), Abel, 341 
characteristic, 136, 137 
degree of, 3 
exponential, 352 
greatest common divisor of, 4 
Hermite, 174, 341 
irreducible, 4 
Laguerre, 342 
leading coefficient of, 3 
Legendre, 172 
lower factorial, 340 
minimal, 124, 125 
monic, 3 

relatively prime, 4 
split, 140 
power set, 1 1 
prime, 21 
principal ideal, 18 
principal ideal domain, 19 
projection(s), 62, 145 
canonical, 65 
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modulo a subspace, 65 
natural, 65 

onto a subspace, 68, 145 
orthogonal, 147, 190 
projective dimension, 325 
projective geometry, 325 
projective line, 325 
projective plane, 325 
projective point, 325 
pseudobasis, 330 



Quadratic form, 210 
quotient space, 64 
dimension of, 68 



Rank, of a bilinear form, 209 
of a linear transformation, 48 
of a matrix, 40,41 
of a module, 93 
rational canonical form, 131 
recurrence formula, 351 
reflection, 227 

resolution of the identity, 150 
orthogonal, 193 
ring, 16 

characteristic of, 24 
commutative, 16 
noetherian, 103 
quotient, 98 
subring, 16 
with identity, 16 
rotation, 227 



Scalar, 27, 84 
sequence, Appell, 337 
associated, 337 

conjugate representation of, 339 
generating function of, 338 
operator characterization of, 339 
recurrence relation for, 351 
Cauchy, 249 
Sheffer, 337 



conjugate representation of, 339 
generating function of, 339 
operator characterization of, 339 
sesquilinearity, 158 
Sheffer identity, 339 
Sheffer operator, 343 
Sheffer sequence, 337 
Sheffer shift, 343 
similarity class, 59, 122 
span, 35, 87 

spectral resolution (see linear operator) 
spectrum, 153 
sphere, 242 

standard basis, 37, 50, 93 
standard vector, 37 
Stirling numbers, 351 
subfield, 43 
submodule, 85 
cyclic, 87 
subring, 16 
subspace(s), 29 
affine, 43 
complement of, 33 
cyclic, 127 
direct sum of, 33 
invariant, 60 
number of, 81 
orthogonal, 213 

orthogonal complement of, 164, 213 
sum of, 31 
zero 30 

support, of a binary sequence, 73 
of a function, 32 
surjection, 4 

Sylverster's law of inertia, 221 
symmetry, 227 
symplectic geometry, 206 
symplectic transformation, 226 



Tensor product, 298, 303, 308 
universal property of, 299, 308 
theorem. Cantor's, 11 
Cayley-Hamilton, 140 
cyclic decomposition, 112, 117, 118, 128 
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expansion, 338 
first isomorphism, 67 
Hilbert basis, 104 
primary decomposition. 111 
projection, 168, 272 
rank plus nullity, 51 
Riesz representation, 172, 211, 288 
Schrdder-Bernstein, 11 
second isomorphism, 68 
spectral for normal operators, 194 
spectral resolution for self-adjoint 
operators, 194 
third isomorphism, 68 
Witt's cancellation, 229 
Witt's extension, 233 
topological space, 243 
topology, 243 
induced by a metric, 243 
torsion element, 95, 108 
total subset, 274 
totally ordered set, 9 
translation, 323 
transposition, 310 
triangle inequality, 160, 240, 264 



Umbral algebra, 332 
umbral composition, 348 
umbral shift, 343 
unit, 21 

unitary space, 158 
upper bound, 9 



vector space, 27 
basis for, 37 
dimension of, 39 
direct product of, 31 
external direct sum of, 32 
finite dimensional, 39 
free, 292 

universal property of, 293 
infinite dimensional, 39 
isomorphic, 48 
ordered basis for, 41 
quotient space, 64 
tensor product of, 298 
vector(s), 27 

isotropic, 211, 214 
length of, 159 
linearly dependent, 35 
linearly independent, 35 
norm of, 159 
null, 211, 214 
orthogonal, 164, 211 
span of, 35 
unit, 159 



Wedge product, 312 
weight, 30 
Witt index, 233 

Zero divisor, 17 
Zorn's lemma, 9 



Vandermonde convolution formula, 341 
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